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ABSTRACT 


Finite  element  (FL)  and  finite  difference  (FD)  spatial 
differencing  schemes  for  linearly  elastic  materials  are  com- 
paied  on  a  common  basis.  Conditions  are  established  under 
which  the  two  methods  are  identical.  Features  commonly 
associated  with  FD  methods  arc  combined  with  a  FE  spatial 
t i eatment  to  obtain  an  explicit  time  stepping  algorithm  suit- 
anle  for  processing  3-1)  linearly  elastic  wave  calculations  on 
the  ILL  I AC  IV  system.  Based  on  projected  timing  estimates, 
the  ILLIAC  code  should  he  capable  of  processing  wave  calcula¬ 
tions  foi  a  10  -node  grid  at  the  rate  of  0.4  sec  per  time  step. 
The  algorithm  has  provisions  for  irregular  3-D  geometries, 
aitificial  damping  for  suppressing  spurious  numerical  oscilla¬ 
tions,  and  a  nonreflecting  boundary  condition. 

Test  calculations  are  presented  to  compare  FE  and  FD 
codes  at  S  .  Comparisons  are  made  for  two  problems:  A 
spherically  symmetric  explosion  with  an  exponentially  decay¬ 
ing  step  pressure  and  Lamb's  problem  -  an  impulse  loading  ap¬ 
plied  to  the  free  surface  of  a  homogeneous  half  space.  Plots 
are  presented  to  show  the  2-D  displacement,  velocity,  and 
kinetic  energy  fields  at  various  stages  of  the  FE  calculations 
for  Lamb's  problem.  As  predicted  from  theoretical  compari¬ 
sons,  no  significant  advantages  in  either  scheme  became  ap¬ 
parent  for  the  problems  tested. 

Test  calculations  are  presented  for  exploring  certain 
features  of  the  3-D  time  stepping  algorithm  both  on  S3's 
Uni vac  1108  and  on  the  ILLIAC  simulator  at  UCSD.  The  algorithm 
has  proven  to  he  very  fast  on  the  Univac,  time  stepping  at  the 
rate  of  0.5  sec  per  time  step  for  a  particular  404-node  3-D 
£ 1  id.  Sample  3-D  pioblems  are  presented  to  demonstrate  the 
use  o!  artificial  damping,  to  illustrate  the  effectiveness 
o.  the  noniei  lectin g  boundary  condition,  and  to  test  the 


ability  cf  the  numerical  scheme  for  propagating  a  step 
velocity  (particle  velocity)  through  160  elements  without 
excessive  spreading  at  the  wave  front. 

Uhat  appears  to  be  a  highly  efficient  algorithm  lias 
been  developed  for  multiplying  a  very  large  sparse  matrix 
times  a  core-contained  column  vector  on  the  ILL1AC  IV.  The 
algorithm  is  independent  of  the  location  of  elements  in  the 
sparse  matrix.  Howeter,  a  tedious  process  is  required  to 
arrange  the  nonzero  elements  of  the  matrix  on  the  disk  storage. 
The  algorithm  is  particularly  effective  for  repetitive  multi¬ 
plies  involving  the  same  sparse  matrix  and,  therefore,  should 
prove  valuable  for  iterative  and  time  stepping  calculations. 
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INTRODUCTION 


I. 

Dining  the  past  deca.de,  the  time  period  when  high  speed 
digital  computers  became  readily  available  in  the  U.  S.,  we 
have  seen  remarkable  progress  in  our  ability  to  analyze  stress 
waves  in  solids.  Our  analysis  techniques  are  no  longer  restricted 
to  linear  processes  and  geometrically  simple  shapes.  With  few 
exceptions,  if  stresses  are  induced  into  an  elastic  medium  that 
can  be  approximated  using  one-  or  two-dimensional  geometry, 
then  state  of  the  art  computer  codes  can  be  applied  to  numeri¬ 
cally  simulate  the  process. 

If,  on  the  other  hand,  either  the  geometry  or  the  source 
of  excitation  cannot  be  suitably  approximated  using  only  two 
spatial  dimensions,  then  we  find  severe  limitations  in  our 
ability  to  predict  the  resulting  motions  using  existing 
computer  codes.  Conventional  third-generation  computer..  are 
heavily  burdened  by  the  massive  calculations  involved  in 
simulating  3-D  wave  propagation. 

The  advent  of  the  ILLIAC  IV  computing  system  will  most 
certainly  prove  to  be  a  valuable  asset  for  processing  the  3-D 
calculations  in  a  parallel  mode.  It  is  our  opinion,  based  on 
six  months  of  designing  and  programming  parallel  algorithms 
for  processing  wave  calculations,  that  the  ILLIAC  concept  is 
well  suited  for  the  task.  We  anticipate  processing  3-D  wave 
computations  on  the  ILLIAC  IV  about  40  times  faster  than  on 
conventional  computers,  about  10  times  faster  than  on  the 
CDC  7600. 

We  expect  to  achieve  further  savings  over  existing  3-D 
codes  as  the  result  of  the  algorithm  that  is  being  employed 
in  the  ILLIAC  code.  The  state-of-the-art  in  performing  3-D 
wave  calculations  is  not  sufficiently  well  developed  to  state 
how  much  savings  is  to  be  realized  from  our  algorithm.  It 
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is  instructive,  however,  to  note  that  the  number  of  multiply 
and  add  operations  per  node  that  are  required  to  complete  one 
time  step  (250)  is  competitive  with  2-D  computer  schemes. 

(We  have  recently  formulated  a  new  scheme  for  performing 
3-D  wave  calculations  in  materials  with  arbitrarily  nonlinear 
constitutive  properties  that  requires  only  about  2000  multiply 
and  add  operations  per  element  per  time  step). 

Based  on  timing  estimates  for  the  ILLIAC  IV,  we  have 
gauged  the  execution  speed  for  performing  5-D  linear  stress 
wave  propagation  at  40  psec  per  node  per  time  step  or  0.4  sec 
per  time  step  for  a  104-node  arbitrarily  skewed  grid.  There 
are  a  number  of  pressing  problems  in  earthquake  and  explosion 
seismology  that  require  such  a  3-D  capability.  We  note  the 
following : 

1.  Earthquake  ground  motions  generated  by  the 
spontaneous  rupture  of  subsurface  materials 
along  a  plane  of  weakness  in  a  prestressed 
region  in  the  earth’s  crust. 

2.  Explosion-generated  stress  waves  in  the  elastic 
regime  surrounding  the  inelastic  zone  of  the 
explosion;  also  the  anamolous  seismic  signal 
that  results  from  the  dynamic  relaxation  that 
takes  place  as  the  explosion- induced  fracture 
zone  is  created  in  a  prestressed  geologic 
formation. 

3.  Ground  motion  and  the  influence  of  the  subsurface 
geologic  configuration  on  the  amplitude,  frequency, 
and  duration  of  the  surface  motions. 

4.  The  interaction  between  propagating  seismic  waves 
and  underground  structures. 

The  applicability  of  the  3-D  elastic  code  will,  of  course, 
extend  far  beyond  the  seismic  field. 


II.  NUMERICAL  THEORY 


2.1  MNITE  ELEMENT  AND  MNITE  DIFFERENCE  METHODS 

Both  finite  element  (EE)  and  finite  difference  (FD) 
techniques  have  been  widely  used  for  calculating  stress 
waves  in  solids.  Considerable  experience  has  been  accumulated 
in  the  use  of  ED  techniques  for  performing  step-by-step  cal¬ 
culations  of  propagating  high  intensity  stress  waves  through 
geologic  continua;  while  the  major  application  of  the  FE 
techniques  has  been  for  performing  wave  calculations  in 
geologic  continua  and  civil  structures  where  wave  lengths  on 
the  order  of  the  structural  system  are  of  interest. 

The  diversity  in  the  application  of  these  techniques 
probably  comes  much  more  from  their  historical  development 
and  application  than  from  inherent  advantages  or  restrictions 
in  the  two  techniques.  Eor  example,  we  note  that  a  reference 
to  a  dynamic  FE  computer  code  often  carries  with  it  the 
following  numerical  implications: 

1.  Flcment  configurations  that  combine  structural 
elements  (beams,  shells,  etc.)  with  continuum 
elements  (tetrahedra,  hexahedra,  etc.)  into  a 
single  problem  representation.  No  restrictions 
are  placed  on  the  location  of  node  points. 

2.  Substitution  of  the  constitutive  laws  into  the 
equation  of  motion  prior  to  discretisation. 

The  stiffness  matrix  of  the  FE  displacement 
method  combines  three  operations  into  one: 
spatial  differencing  of  nodal  displacements 

to  get  strain,  constitutive  laws  to  relate 
stress  to  strain,  and  spatial  differencing  of 
the  stress  field  to  obtain  the  body  forces  and 
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inertial  forces.  Stresses  and  strains  are  com¬ 
puted  separately  from  the  "main-stream"  calculations 
for  the  purpose  of  computer  output  and,  in  non¬ 
linear  calculations,  for  the  purpose  of  modifying 
the  elastic  moduli. 

3.  Massive  storage  requirements  due  to  retaining  all 
of  the  influence  coefficients  between  adjacent 
node  points  (the  stiffness  matrix). 

4.  Non-explicit  time  stepping  using  either  modal 
superposition  or  implicit  time  stepping. 

These  connotations,  which  apply  to  most  but  not  all 
FE  computer  codes,  come  in  addition  to  our  understanding  of 
what  is  meant  by  the  FE  method:  "A  numerical  technique  for 
approximating  continua  by  a  discrete  system  composed  of 
elements.  The  behavior  of  the  continua  within  each  element 
is  characterized  by  interpolation  functions;  the  amplitude 
of  field  variables  at  isolated  node  points  serve  as  the 
participation  coefficients  for  the  element  interpolation 
functions.  Energy  principles  are  used  to  determine  the 
particular  combination  of  interpolation  functions  that  best 
satisfies  the  governing  conservation  equations.  Tlis 
procedure  results  in  many  algebraic  equations  involving 
nodal  values  of  the  field  variables."  The  scheme  for 
processing  these  algebraic  equations  should  not  be  confused 
with  the  scheme  used  for  generating  the  equations. 

FD  techniques  for  computing  propagating  stress  waves 
also  carry  numerical  implications  that  are  independent  of 
the  basic  FD  method,  which  we  will  describe  as:  "  A  numerical 
technique  in  which  a  continuum  is  characterized  at  isolated 
node  points,  i.e.,  the  values  of  field  variables  are 
prescribed  only  at  isolated  node  points.  A  derivative  of 
a  field  variable  at  a  node  point  is  approximated  by  some 
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predetermined  combination  of  the  values  of  the  field  variable 
at  neighboring  node  points.  Using  this  approach,  the  governing 
differential  equations  are  approximated  point  by  point  through¬ 
out  the  continuum  to  yield  a  set  of  algebraic  equations 
involving  nodal  values  of  the  field  variables." 

We  note  some  chracteristics  shared  by  many  of  the 
large  FD  computer  codes: 

1.  Zoning  by  planes  that  extend  completely  through 
the  continuum  so  that  each  zone  appears  in  the 
shape  of  a  skewed  rectangle  (bricks  in  3-D)  with 
limited  facility  for  accomodating  structural 
appendages . 

2.  Explicit  time  stepping. 

3.  Minimal  storage  requirements  due  to  the  repetitious 
calculation  of  the  influence  coefficients  between 
adjacent  node  points  at  each  time  step. 

4.  Separation  of  the  calculations  into  stages  so 
that  constitutive  laws  are  applied  at  an  inter¬ 
mediate  stage  in  each  time  step  loop.  Developing 
stresses  from  strains  explicitly  in  each  time 
step  loop  is  most  convenient  for  accomodating 
nonlinear  material  response  in  the  numerical 
calculations . 

We  note  that  both  the  FE  and  the  FD  method  characterize 
the  governing  equations  for  continua  by  a  set  of  algebraic 
equations  involving  nodal  values  of  the  field  variables. 

The  FE  method  accomplishes  this  discretization  in  a  particular 
way  that  involves  interpolation  functions  and  energy  principles. 
When  enumerated  in  this  manner,  we  are  led  to  conclude  that 
the  FE  method  is  in  fact  a  scheme  for  generating  difference 
equations  and  therefore  should  be  categorized  as  a  type  of 
FD  method.  Even  so,  there  are  some  features  of  the  FE 
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method  that  do  not  generally  appear  in  conventional  FD 
procedures  : 


1.  Because  the  IE  method  is  based  on  energy  princi¬ 
ples,  the  results  of  a  consistent  FE  calculation 
provide  bounds  on  certain  quantities,  e.g.,  strain 
energy  and  resonant  frequencies. 

2.  Because  the  FE  method  is  developed  element-by¬ 
element,  no  special  considerations  are  needed  to 
model  sharp  spatial  discontinuities  in  continuum 
properties;  also  no  special  treatment  is  needed  to 
appiy  tractions  to  element  surfaces,  either  on 
internal  or  boundary  elements. 

3.  Because  the  discrete  difference  equations  generated 
using  the  FE  method  relate  nodal  forces  and  nodal 
displacements,  the  influence  coefficients  between 
neighboring  node  points  can  be  viewed  physically 

as  spring  constants.  This  physical  insight  into 
the  numerical  scheme  allows  considerable  flexibility 
in  joining  structural  appendages  into  a  numerical 
treatment . 


2-2  FE  and  FD  DIFFERENCE  COEFFICIENTS 


In  order  to  emphasize  the  basic  similarities  between 
the  FE  and  FD  methods,  we  will  present  the  difference  co¬ 
efficients  that  correspond  to  the  two  most  widely  used  FE 
and  FD  methods.  In  so  doing  we  will  be  providing  a  common 
basis  from  which  to  compare  the  two  techniques.  Indeed,  we 
find  that  the  least  elegant  FE  analysis  and  the  least  elegant 
FD  analysis  are,  in  fact,  identical  insofar  as  the  spatial 
discretization  of  linear  isotropic  continua  using  a  uniform 
rectal inear  grid.  The  other  FE  and  ED  techniques  that  are 
investigated  in  this  section  differ  somewhat,  but,  when  a 


gross  approximation  is  used  in  the  integration  scheme  for 
evaluating  the  element  stiffness  matrix  in  the  FE  method, 
these  two  techniques  also  result  in  identical  spatial 
differencing  techniques. 

Adopting  subscript  notation,  we  substitute  the  most 
general  linear  constitutive  equation  for  an  inhomoge  nous 
anisotropic  medium 


aij  =  ljkl  ekl 


ij  kl 


3Uj 

nr 


into  the  equations  of  motion 


3a.  . 

a -r1  *  =  0“ 


J 

to  obtain  the  equation  of  motion  expressed  in  terms  of  dis 
placement 


where 


3Xj  (ti-ikl  3^)  *  fi  ‘ 


X  . 

1 

is 

the 

u . 

1 

i  s 

the 

L  .  . 

IJ 

i  s 
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0  .  . 

IJ 

is 

the 

f . 

1 

i  s 

the 

p 

i  s 

the 

Eijkl  exPresses  the  various  elastic  moduli.  For 
isotropic  materials  we  have 

Eijkl  ’  "  *ik‘jl  +  •*  hhjk  * 

Both  FE  and  FD  methods  result  in  discrete  representations 
rf  the  spatial  derivatives  appearing  in  Eq.  (2.1)  of  the  form 


32uk 


3x . 3x  . 
i  J 


Ax . Ax . 
i  3 


z 

rv  =  _ 


+  1 


£ 

?  =  _  i 


WaBy  uk(aBy) 


(2.2) 


for  a  rectilinear  3-D  grid  having  a  uniform  grid  spacing 
AXj,  Ax2 ,  Axa,  illustrated  in  Fig.  2.1.  The  various  numeiical 
schemes  are  characterized  by  the  values  of  the  27  difference 
coefficients  w^,  a  =  -1,0, +1;  B  «  -1,0, +  1;  y  =  -1,0, +  1. 

The  difference  coefficients  wagy  for  the  most  ele¬ 
mentary  FD  scheme  are  presented  first.  The  coefficients  for 

the  particular  mixed  partial  derivative  —  which 

3x i 3x  2  ’ 

are  typical  of  those  for  other  mixed  partial  derivatives, 
are  given  by 


w 


+  1,  +  1,0 


w 


+  1,-1,0 


=  w 


1,  +  1,0 


(2.3) 


with  the  remaining  23  coefficients  equal  to  zero.  The  most 
elementary  FD  scheme  for  approximating  uses  the  dif¬ 
ference  coefficients  3xi 


W+ 1,0,0  =  1 

W0,0,0  =  '2  (2.4) 

and  the  remaining  24  coefficients  are  zero.  This  difference 
scheme  for  mixed  partial  and  straight  partial  derivatives  is 
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l-“i  -\  Y  '■'•r 


a«-l  a - 0  a-  +  l 

Y«  +  l 


Fig.  2.1--Uniform  rectalinear  3-D  grid  illustrating 
concept  of  n jighbor  nodes. 


presented  graphically  in  Fig.  2.2. 


In  t.he  most  elementary  FE  scheme,  a  brick  element 
is  formed  from  two  complimentary  sets  of  five  tetrahedral  ele¬ 
ments,  illustrated  in  Fig.  2.3.  The  corresponding  difference 
coefficients  using  this  FE  configuration  are  found  to  be 
identical  with  those  presented  above  and  in  Fig.  2.2  Hence, 
we  are  led  to  conclude  that  for  a  uniform  rectilinear  grid  ’ 
the  most  elementary  IT  and  FI)  schemes  will  yield  identical 
results  if  both  schemes  use  the  same  time  stepping  procedure. 
rhis  conclusion  assumes  the  use  of  a  simple  lumped  mass 
approximation  in  the  FE  scheme  and  comparable  treatment  of 
boundary  conditions  a.id  forcing  terms  in  both  schemes. 

1V<?  n0W  examinc  a  FD  scheme,  which  is  generally  con- 
eied  to  be  well  suited  for  performing  nonlinear  calcula¬ 
tions,  based  on  its  wide  use  in  2-D  shock  codes.  It  is  de¬ 
veloped  in  two  stages:  first  the  stresses  (involving  first 
partial  derivatives  of  the  displacement  field)  are  computed 
at  the  centroid  of  the  zones,  then  these  stresses  are  dif¬ 
ferenced  to  give  stress  gradients  (the  second  partial  deriva¬ 
tives  of  displacement  appearing  in  Eq.  (2.1))  at  the  node 
points.  he  will  refer  to  this  FD  procedure  as  the  cell  centered- 
stress  differencing  method.  The  difference  coefficients  for 

the  mixed  partial  derivative  — - _  usino  tin?  . 

^  ~  using  tiiis  scheme  become 


W 

1  2 


''♦Wl.il  *  "-i.-i,.,  ■  *1/16 


1 


+1.-1,+1  =  W-l,+l,+l  ~  ~1/16 


w 


+  1.  +  1.0 


=  w 


1 , - 1 , 0  =  +1/8 


iv 


=  w 


1,+1,0  "1/8 


(2.5) 


1  0 


with  the  remaining  15  influence  coefficients  equal  to  zero. 

*  2 

The  difference  coefficients  for  - —  are  given  by 

3x2 

i 


W+l,-l,-l  =  +1/16 


W+1,+1,0  =  W+1,0,+1  =  +1/8 


W0,+1,+1  =  “1/8 


W+l,0,0  "  +1/4 


w 


0,0, +  1 


w0,*l,0  ■  '1/4 


w0,0,0  ■  -W2  (2.6) 

The  coefficients  for  the  cell-centered-stress  differencing 
scheme  are  presented  graphically  in  Fig.  2.4. 

The  cell-centered-stress  differencing  method  has  one 
undesirable  feature  that  does  not  appear  in  any  of  the  other 
methods  presented  in  this  report.  The  matrix  that  comprises 
the  complete  set  of  difference  coefficients  for  a  solid  with 
traction-free  boundary  conditions  has  eighteen  zero  eigen¬ 
values,  twelve  more  than  the  six  zero  eigenvalues  that  corres¬ 
pond  to  the  six  rigid  body  displacements  (three  translation 
and  three  rotation).  This  means  that  a  numerical  grid  can 
undergo  deformations  other  than  the  six  rigid  body  deforma¬ 
tions  Without  generating  stresses.  In  particular,  we  see  that 
all  of  the  second  order  partial  derivatives  of  Eq.  (2.1)  are 

identically  zero  for  the  twelve  special  displacement  configura¬ 
tions 
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(a)  Difference  coefficients  for 


fb)  Difference  coefficients  for  ~ 

9  x  2 

'ig-  2‘4--Difference  coefficients  using 


uk(a3y)  =  (-l)a+B 
uk(aBy)  =  (-l)a+Y 
uk(aBy)  =  (-1)6+y 
uk(aBY)  =  (-l)a+6+Y 

with  k  =  1,  2,  3  when  we  use  the  cell-centered-stress  dif¬ 
ferencing  scheme.  This  feature  of  the  cell-centered-stress 
method  is  sometimes  referred  to  as  "hour  glass  instability”. 

The  Fli  scheme  used  in  the  3-D  test  calculations  in 
Sections  III  and  V  of  this  report  employs  a  brick  element 
which  permits  linear  vacations  in  the  displacement  field 
along  the  twelve  edges  of  the  element,  quadratic  variations 
along  skewed  lines  on  the  six  faces  of  the  element,  and  cubic 
variations  along  skewed  lines  passing  through  the  element. 

For  the  case  of  a  rectilinear  brick  geometry,  the  displacement 
field  is  approximated  by 


ui  s  E  ci 


p,q,r=0 


(pqr) 


xPxV 

1  2  3 


Ax  Ax  Ax  yi 

1  ?  3  a,B,Y=0 


(-D 


a+B+Y+l 


x  +x  -2x  , 
i  i  (a)  i  ( 


x  +x  rtn  “  2x  n  \ 
2  2(B)  2(h ) 


x  +x 


3  3  ( Y )  2 X 3  C )  I  Ui(aBy) 


(2 


where  Xj(a),  x^^,  x^  denotes  the  Cartesian  coordinates 
of  the  a,  3,  y  planes  of  Fig.  2.1,  respectively,  and  x.(’2) 
denotes  the  element  centroid.  The  theory  behind  the  develop¬ 
ment  of  tnis  trilinear  (displacement  varies  linearly  in  x 
X2*  and  x^)  brick  element  is  presented  in  Appendix  A.  Tbe\lif 

ference  coefficients  for  the  mixed  partial  derivative  8  2 

3x  8x 


that  result  from  the  trilinear  brick  elem-nt  are  given  by 


w+l, tl.il  '  w-l,-l,il  ’  +  1/24 

w*i,-i.ii  '  w-i,u,ii  ’  -1/24 

K+1,+1,0  =  W-1,-1,0  “  *1/6 

"tl.-l.o  ■  K-1>+1,0  '  -1/6  (2.8) 


with  the  remaining  15  coefficients  equal  to  zero. 

linear  brick  element  coefficients  for  - —  are 

»x2 


The  tri- 


W+l,+l,+l  =  +  1/36 


"il.+l.P  =  W+ 1 ,  0 ,  + 1  =  +1/9 


W0,+1,+1  *  "1/18 


w 


11,0,0 


+  4/9 


W0,0,+l  =  W0 ,+l , 0  "  "2/9 
W0,0,0  =  "8/9 


(2.9) 


These  coefficients  are  presented  graphically  in  Fig.  2.5. 

he  note  from  Figs.  2,4  and  2.5  that  the  cel 1 -centered- 
stress  FD  method  does  not  correspond  to  the  trilinear  brick 
FE  idealization.  The  trilinear  FE  scheme  would  be  obtained, 
however,  from  a  FD  method  in  which  the  stresses  are  calculated 
slightly  closer  to  the  node  at  which  the  equation  of  motion 
is  being  applied,  namely  at  the  1/3  points  of  the  cells. 
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Such  a  FD  scheme  would  be  rid  of  the  so-called  "hour  glass 
instability";  however,  it  would  be  excessively  cumbersome  to 
compute  and  store  eight  stress  tensors  at  the  eight  1/3  points 
of  a  brick  element. 

We  have  found  that  the  3-D  FE  brick  reduces  to  the 
cell-centered-stress  differencing  scheme  if  strain  energy 
density  in  the  FE  method  is  assumed  to  be  constant  throughout 
the  brick  element.  This  is  equivalent  to  using  a  1-point 
Gaussian  quadrature  integration  procedure  for  evaluating  the 
element  stiffness  matrix.  Nonlinear  displacement  modes  that 
appear  in  the  element  displacement  field  (x  x  ,  x  x  ,  x  x  , 

1  2  1  3  2  3  * 

x i x 2 x 3  °f  Eq.  (2.7))  result  in  no  strain  energy  using  the 
1-point  integration  scheme.  That  is  to  say,  grid  displace¬ 
ments  that  give  rise  exclusively  to  nonlinear  interelement 
element  displacement  modes  (hour  glass  grid  deformations) 
result  in  no  restoring  forces  (stress)  in  the  continuum 
and  therefore  go  unchecked  in  the  numerical  calculations. 

It  is  customary  in  FE  codes  to  employ  2-  or  3-pcint  integration 
schemes  for  each  dimension  in  the  strain  energy  integral. 
Consequently  the  "hour  glass  instability"  does  arise  in  the 
FE  differencing  scheme. 

2 . 3  DESIRABLE  FEATURES  FOR  3-D  WAVE  CALCULATIONS 

In  order  to  develop  an  optimal  scheme  for  computing 
elastic  waves  in  solids,  we  have  selected  from  both  FE  and 
FD  computing  schemes.  The  following  features,  listed  approxi¬ 
mately  in  the  order  of  their  importance,  have  been  selected 
to  characterize  the  computing  technique  for  processing  3-D 
elastic  waves  on  the  Illiac  IV. 

1.  Explicit  Time  Stepping  —  A  small  time  step  on 

the  order  of  the  Courant  value  (At  =  (Ax/V  )  =  grid 
dimens ion/P-wave  velocity)  is  needed  to  achieve 
satisfactory  accuracy  in  the  propagation  of  sharp 
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wave  fronts  (wave  length  equal  to  about  10  grid 
dimensions.  This  is  true  even  for  implicit  schemes, 
which  may  be  stable  using  much  greater  time  steps. 
Numerical  experimentation  at  S3  indicates  that  a 
three-point  (t-At,  t,  fAt)  explicit  scheme  using  a 
time  step  of  0.5  Atc  can  match  the  accuracy  of 
the  best  three-point  implicit  scheme  that  uses 
twice  this  time  step;  yet  the  implicit  scheme 
requires  many  tin.es  more  operations  per  time 
step.  From  Table  I  we  see  that  a  direct  implicit 
scheme,  which  operates  using  the  two-pass  back 
substitution  procedure  (most  commonly  used 
procedure  in  implicit  FE  codes)  at  each  time 
step,  requires  nearly  thirty  times  more  calculations 
per  time  step  than  an  explicit  scheme  for  a  20  by 
20  by  20  3-D  grid.  Furthermore,  the  explicit  time 
stepping  scheme  allows  much  greater  flexibility 

for  modifying  constitutive  laws  as  the  calculation 
proceeds . 


Flexible  Grid  Configurations  —  in  the  zoning  of 
3-D  solids,  care  must  be  exercised  to  achieve 
something  close  to  an  optimal  grid  configuration. 
Rectilinear  grids,  for  example,  would  probably 
be  inappropriate  for  modeling  a  buried  explosion 
in  3-D  geometry.  The  node  numbering  scheme  of 
FE,  in  which  the  nodes  are  numbered  consecutively 
from  one  through  the  total  number  of  nodes  in  the 
grid,  accomodates  arbitrary  grid  configurations 
and  therefore  has  been  adopted.  Also,  the  FE 
state  of  the  art  for  developing  difference  equa¬ 
tions  for  complex  grid  configurations  is  more 
advanced  than  that  for  conventional  FD  methods. 
Curved  tetrahedron  and  hexahedron  elements  are’ 
systematically  developed  in  the  FE  method  using 
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isoparametric  element  techniques,  developed  in 
Appendix  A. 

3.  Quasi-linear  Constitutive  Laws  —  In  the  interest 
of  developing  a  fast  computing  scheme  for  treat¬ 
ing  very  large  grids  O  10s  nodes),  the  initial 
code  configuration  assumes  no  change  in  the 
elastic  constants  of  the  material  as  the  time 
stepping  proceeds.  However,  a  restricted  class 
of  nonlinear  behavior  can  be  accommodated  in 
the  present  version  of  the  code.  Less  re¬ 
strictive  nonlinear  constitutive  laws  can  be 
accommodated  at  some  later  date.  In  the  present 
code  configuration,  mass  storage  is  used  effect¬ 
ively  to  avoid  the  repetitive  calculation  of 
influence  coefficients  at  each  time  step  (see 
Table  I). 

2’4  DLS-CRETE  EQUATIONS,  TIME  STEPPING,  AND  ARTIFICIAL  DAMPING 

Wave  propagation  through  a  spatially  discrete  linear 
system  can  be  described  using  matrix  notation  by  the  equation 

[M]  (UiCt) }  ♦  [C^HU.Ct)}  +  [Kjj  ]  (Uj  (t) }  =  (F.(t)}  (2 

where  (U^t)}  is  a  column  listing  of  the  i—  component  of 
displacement  of  the  node  points  throughout  the  system,  (F.(t)} 
is  a  column  listing  of  the  i—  component  of  nodal  forcing1 
terms,  [M]  is  a  diagonal  listing  of  the  mass  lumped  at  the 
various  node  points,  [K^]  is  a  sparsely  populated  banded 
matrix  of  interaction  terms  between  neighboring  node  points, 
and  fC^j ]  is  matrix  that  is  introduced  for  the  special 
purpose  of  damping  certain  features  of  the  numerical  solution. 
The  matrix  equation  above  results  from  spatially  differencing 
the  wave  equation  for  an  elastic  continuum,  Eq.  (2.1). 
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TABLE  I 

3-D  ELASTIC  STRESS  WAVE  PROPAGATION 


Step-by-Step  Integration  of  the  Equations 
of  Motion 

Explicit 

f  Implicit 

Difference  Eqs. 
Calc.  Initially 
and  Stored 

Difference  Eqs. 
Calc,  at  Each 
Time  Step 

Difference  Eqs. 
Calc,  initially 
and  Stored 

1  Difference  Eqs. 

Calc,  at  Each 

1  Time  Step 

Number  of  Multiply 
and  Add  Operations 
per  Time  Step 

250  N* 

4000  N 

Direct»» 

(40*18  N2/5)N 

■nn  |W| 

Iterative*** 

250  N I 

Iterative 

4000  NI ***• 

Storage  Require¬ 
ment 

251  N 

39  N 

Direct 

(15*2N2/S)N 

Direct 

(45*2N2/S)N 

Iterative 

39  N 

I  terative 

251  N 

Outstanding 

Advantages 

1.  Very  fast 

2.  Minimal  rout¬ 
ing  between 
PE's 

1.  Easily  ex¬ 
tended  to  non- 
1  inear  be¬ 
havior 

2.  Minimal  stor¬ 
age  require- 
memts 

1.  Can  use  large 
static  soluti 
computed 

2,  Somewhat  bett 
than  explicit 

time  steps; 
ons  can  be 

er  accuracy 
schemes 

S’  Numerical 

Wave  Propaga¬ 
tion  Computer 

Codes 

1-D  Finite  Diff. 
SKIPPER 

RIP 

2 -D  Finite 
Element  in 
preparation 

2-D  Finite  Diff. 
CRAM 

HELP 

2-D  Finite 
Element 

DYNA 

3-D  Finite 
Element 

SW1S 

3-D  Finite 
Element 

SAPS 

N  Is  the  number  of  nodes  in  the  J-D  grid. 

The  matrix  of  influence  coefficients  it  trienguleriied  initially  with  beck 
substitution  at  eech  time  step. 

An  iterative  scheme  is  used  to  evaluate  the  advanced  displacements, 
l  is  the  number  of  iterations. 


21 


The  "stiffness"  matrix  [K^]  contains  the  difference  coeffi¬ 
cients  discussed  in  Section  2.2.  The  notation  is  typical  of 
that  used  in  the  finite  element,  literature,  Frazier  (1972). 
Equation  (2.10)  is  developed  in  Appendix  A. 

The  influence  that  certain  special  types  of  damping 
matrices  [C i . ]  have  on  the  resulting  calculations  can  be 
determined  for  linear  systems  by  transforming  Eq.  (2.10)  into 
modal  coordinates,  Frazier  (1969),  also  see  Appendix  B.  In 
so  doing,  the  equations  become  decoupled  (independent  from 
one  another)  for  particular  forms  of  the  damping  matrix.  For 
these  special  cases  the  influence  of  the  damping  on  the  tran¬ 
sient  behavior  of  the  discrete  system  can  be  predicted.  The 
result  is  that 

[Cij]  =  3[Kij]  (2.11) 

causes  the  eigenvectors  to  damp  as  the  square  of  their 
natural  frequencies,  i.e.,  w2  damping;  whereas 

[cij]  =  a  6ij [M] 

cause  uniform  damping  of  all  of  the  eigenvectors,  i.e.,  fre¬ 
quency  independent  damping,  as  developed  in  Appendix  B.  The 
8-type  damping  satisfies  the  condition  of  no  damping  for  zero 
strain  rate  in  an  element.  The  a-type  damping  has  little  if 
any  utility  for  wave  propagation  since  this  form  damps  all 
frequencies,  even  rigid  body  displacements  with  no  strain. 

The  0-type  damping  of  Eq.  (2.11)  is  substituted  into 
Lq .  (2.10)  and  a  more  compact  notation  is  adopted  using 
underscores  for  the  directional  component  subscripts  i 
and  j  to  obtain 

[M] (Ut }  +  6 [K] (0t  >  +  [K] (Ut )  =  {  Ft }  (2.12) 
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<or  the  collection  of  equations  of  motion  at  time  t.  We 
use  a  central  difference  approximation  in  time  to  obtain 


1 

At2 


[  M  ]  { U 


t  +  At 


2U .  +  U  ,  }  +  ^ 

-t  -t-At  2  At 


W'SfAt  -  “t-At1 


♦  (£U!lt)  ■  (Ft) 


uhere 


(2.13) 


At  <  At 
—  c 


(2.14) 


is  the  time  step,  bounded  by  the  Conrant  time  At  ,  which  is 
equal  to  the  time  it  takes  for  a  P-wave  to  cross  one  grid 
dimension,  Ax. 


We  solve  for  the  nodal  displacements  at  the  advanced 
time  step  from  Eq.  (2.14)  to  obtain  our  algorithm  for  step¬ 
ping  the  numerical  calculations  through  time 

{— t  +  At }  =  [l+sr1  ((Pt)  -  [l-D]  (Ut_  At )  +  [2-AtzM"  1  K]  {Ut ))  (2.15) 
where 

[£]  =  S  [ M ]  [K] 

and 

(Pt}  =  At 2 [M] ‘ 1 { Ft } 

he  note  that,  while  the  inversion  of  the  diagonal  matrix  [M] 
is  trivial,  the  inversion  of  the  nondiagonal  matrix  [1+D] 
is  very  tedious.  However,  for  the  case  when  3=0  (no  damp¬ 
ing),  this  term  reduces  to  an  identity  matrix.  This  leads  us 
to  believe  that  perhaps  an  approximate  inverse  can  be  obtained 
by  perturbation  techniques,  since  the  influence  of  the  damp¬ 
ing  terms  should  only  be  to  perturb  the  undamped  solution. 


Following  this  reasoning  we  approximate  the  inverse  by 
[l+£]  ’ 1  “  [1,-D+D2  .  . .  ] 


(2.16) 

The  geometric  expansion  converges  to  the  desired  ii verse 
provided  the  highest  eigenvalue  of  [D] ,  Amax,  is  less  than 
unity,  i.e., 


1  >  A 
-  max 


At 

T~ 


8  to 


max 


therefore 


4(A+2p) 

pAx2 


8  <  8 


max 


Ax2 
2  A  tV2 


(2.17) 


where  wmax  is  the  highest  natural  frequency  of  the  discrete 
system,  A  and  y  are  elastic  material  constants,  p  is  the 
mass  density,  and  VD  = 


is  the  P-wave  velocity. 


A  technique  for  predicting  an  optimum  damping  coeffi¬ 
cient  has  been  developed  in  Appendix  B.  Here  we  find  that 
optimal  values  of  8  for  removing  high  frequency  oscillations 
from  the  numerical  calculations  of  a  sharp  wave  front  lie 
within  the  convergence  criteria  of  Eq.  (2.17).  This  is  true 
provided  the  oscillations  can  be  tolerated  for  about  ten  time 
steps  following  their  initiation  by  rapid  changes  in  the  load¬ 
ing  function  or  the  constitutive  laws. 
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Using  the  approximate  inverse  expressed  by  Eq.  (2.16), 
we  obtain  our  time  stepping  algorithm,  arranged  so  as  to 
minimize  the  lumber  of  multiplication  and  add  operations  per 
time  step, 

=  <Et>  •  -  2<“tJ 

-  At 2  [M]  ‘ 1  [K]  j7|r_Pt-  t  (|_  *  i)utJ  (2. 

in  which  we  have  dropped  terms  involving  [K] [K]  for  the 
purpose  of  computing  economy.  The  errors  introduced  by 
dropping  these  terms  are  of  the  same  order  as  those  for 

approximating  the  inverse  in  Eq.  (2.16);  such  terms  io  not 
exist  with  6=0. 

The  suitability  of  the  time  stepping  algorithm  above 
is  demonstrated  in  Section  3.3  by  test  calculations  involving 
sharp  wave  fronts  propagating  through  a  chain  of  3-D  elements. 

2.5  NON- REFLECTING  BOUNDARIES 

Numerical  treatments  of  elastic  waves  in  the  earth  are 
frequently  plagued  by  waves  that  reflect  from  grid  boundaries. 
Often,  the  arrival  of  these  reflected  waves  is  sufficiently 
delayed  by  simply  extending  the  grided  region  so  as  to  not 
interfere  with  calculations  in  the  source  region.  This  pro¬ 
cedure  can  be  very  costly  in  terms  of  computer  effort,  perhaps 
excessively  so  for  many  3-D  calculations. 

A  more  appropriate  solution  for  avoiding  these  reflected 
waves  would  be  to  develop  boundary  conditions  that  do  not 
reflect  incident  waves.  Many  researchers  have  addressed  this 
problem  with  limited  success.  The  degree  of  success  of  the 
various  schemes  depends  on  either  the  frequency  content  of 
the  incident  wave  or  its  angle  of  incidence  with  the  boundary. 


18) 
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We  have  developed  and  adopted  an  extremely  simple  boundary 
condition  that  reflects  less  than  4  percent  of  the  normal 
incident  wave  energy.  On  a  practical  basis,  the  elimination 
of  normal  incident  waves  is  of  greatest  importance  since  the 
waves  that  hit  the  boundaries  at  an  angle  do  not  reflect 
directly  back  at  their  source.  We  have  not  yet  tested  the 
scheme  for  non-normal  incident  waves  to  determine  to  what 
extent  these  waves  will  be  eliminated  from  the  grided  region. 


The  reflected  waves  from  normal  incident  plane  waves 

at  a  free  surface  have  the  same  displacement  character  as  the 

incident  waves;  they  are  simply  propagating  in  the  opposite 

direction.  The  reflected  waves  from  normal  incident  plane 

waves  at  a  rigid  boundary  surface  also  have  the  same  character 

as  the  incident  waves  but  with  the  opposite  sign.  Therefore 

a  numerical  solution  that  is  taken  as  the  average  of  the  rigid 

boundary  and  the  free  boundary  calculations  should  possess  no 

reflected  waves.  Denoting  the  free  boundary  solution  as 
I  ( 2 1  |  1 

l-t+Atl  and  t*le  rigid  boundary  solution  as 

then  the  non-reflecting  solution  is  given  by 


u(D 
-t  +  At 


UPL!  =  1/2  j  ♦  1/2{U 


! 


— t  +  At  | 


(2)  ) 

^t  +  At) 


(2 


One  way  to  apply  this  scheme  for  eliminating  boundary 
reflections  would  be  to  carry  out  two  complete  solutions, 

{U  }  and  (U*-2)}.  Then  from  these  two  solutions,  the  solu¬ 
tion  with  boundary  reflections  removed  could  be  computed  from 
Fq.  (2.19).  Such  a  technique  would  be  excessively  tedious. 

A  preferable  technique  for  removing  boundary  reflections  is 
to  apply  Eq.  (2.19)  over  the  duration  of  a  single  time  step 
taking  into  account  the  appropriate  initial  conditions  for 

the  time  step,  {U^l  and  ^t-At^’  This  is  the  technique 
that  is  developed  below. 


19) 
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First  we  will  explain  how  !  ! 

|  — t  +  A 1 1 


is  obtained. 

the  time  stepping  algorithm  given  by  Eq.  (2.18)  applies 
free  boundaries  with  {Pt)  zero  on  the  boundaries,  the 
surface  solution  computed  over  a  single  time  step  comes 
directly  from  the  time  stepping  algorithm,  i.e.,  j 

i  i  [  L  ^  ut 


for  all  nodes. 


Since 

to 

free 


The  rigid  boundary  solution  j  u£^tj  differs  from 
{Ut+At}  as  taken  directly  from  Eq.  (2.18)  only  at  points 
on  the  non-reflecting  boundary.  For  node  points  along  the 
non-reflecting  boundary,  the  rigid  boundary  nodal  displacements 
are  given  by 


This  is  the  term  in  Eq.  (2.18)  that  allows  boundary  displace¬ 
ments  to  influence  internal  points  over  the  duration  of  a 
single  time  stop,  he  note  that,  with  8  =  0,  this  is  equiva¬ 
lent  to  setting  the  particle  velocity  at  t  +  hAt  equal  to 
zero  for  those  nodes  on  a  rigid  boundary. 

Consequently,  the  advanced  displacement,  taken  as  the 
average  of  the  rigid  and  free  advanced  displacements,  becomes 


Jn(3) 


I 


Mt./st 


for  nodes  not  along  the  non-reflecting  boundary  and 


luOn  I 

(Mt.it, 


1/2!Mt.it 


1/2j(1  *  st)ut  '  It  ut-it|  (2-20) 


for  those  nodes  along  the  non- ref lecting  boundary.  The  advance 
displacements  j  U  ^  + 1 1  {  are  then  used  in  Eci-  (2.18)  at  the 
next  time  step. 
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Test  calculations  using  this  technique  for  eliminating 
boundary  reflections  from  normal  incident  waves  are  presented 
in  Section  3.3. 
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III.  ELASTIC  WAVE  CALCULATIONS  USING 
NON-PARALLEL  PROCESSING 

3‘1  SPHERICALLY  SYMMETRIC  ELASTIC  EXPLOSION  CALCULATIONS 

As  the  first  problem  for  performing  comparative  cal¬ 
culations  using  conventional  FE  and  FD  techniques,  we  have 
computed  the  elastic  compression  waves  that  radiate  from  a 
spherical  cavity  undergoing  an  exponentially  decaying  pres¬ 
sure  step.  The  exact  solution,  Blake  (1952),  contains  a 
discontinuity  in  the  stresses  and  the  particle  velocity  at 
the  wave  front  and  therefore  serves  as  a  severe  test  of 

the  numerical  techniques  for  simulating  body  wave  propaga¬ 
tion. 

The  parameters  used  in  the  calculation  are  as  follows: 


Ar  =  zone  size  =  0.1  m 

Vp  =  P-wave  velocity  *  5  km/sec 

Vg  =  S-wave  velocity  =2.5  km/sec 

v  =  Poisson’s  ratio  =  1/3 
p  =  mass  density  =  2.0  g/cc 

_  3 

=  cavity  l°ad  (kbar)  =  e_t  ^ 

Computed  particle  velocities  at  2.06  msec  following  the  ap¬ 
plication  of  the  pressure  loading  are  presented  in  Fig.  3.1, 

The  FD  calculations  have  been  reported  on  previously, 
Cherry,  et  al.  (1972).  They  were  performed  using  the  1-D 
SKIPPER  code  for  spherically  symmetric  waves  with  an  ex¬ 
plicit  time  stepping  scheme. 
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Particle  Velocity,  m/sec 


Fig 


3 . 1  -  -  FI:  and .  FD  computed  particle  velocity  at  0.00206  sec 
ollowing  the  application  of  an  exponentially  de¬ 
caying  pressure  step  to  a  spherical  cavity. 
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I  he  FIi  calculations  were  performed  for  an  axisymmetric 
wedge  of  elements  using  the  2-D  DYNA  code.  The  DYNA  code, 
which  uses  an  implicit  time  stepping  scheme,  was  originally 
developed  by  Wilson  (1969).  In  this  code,  Wilson's  original 
time  stepping  scheme  has  been  replaced  by  an  extended  form 
of  Newmark's  g  method,  Newmark  (1959),  see  also  Goodreau  (1970). 

Based  on  the  FL  and  FD  calculations  for  the  pressure- 
loaded  cavity,  we  have  made  the  following  observations: 

1.  Both  techniques  result  in  numerical  oscilla¬ 
tions  (similar  to  the  Gibb's  phenomena 
associated  with  a  truncated  Fourier  series) 
following  the  discontinuous  wave  front  when  no 
artificial  damping  is  used  in  the  calculations. 

2.  A  reduction  in  At  does  not  significantly  im¬ 
prove  either  numerical  calculation  unless  it  is 
accompanied  by  a  reduction  in  the  zone  size. 

he  note  that  the  implicit  FE  calculations  use 
a  time  step  equal  to  the  Courant  time  step 

~  which  is  twice  as  large  as  the 

time  step  used  in  the  explicit  FD  calculations. 

3.  The  maximum  particle  velocity  from  the  FE  cal¬ 
culation  is  somewhat  higher  than  the  FD  peak; 
however,  this  is  probably  mainly  the  result  of 
slightly  less  artificial  damping  in  the  FE  cal¬ 
culations  . 

4.  The  displacements  that  are  computed  by  the  two 
codes  follow  the  exact  solution  equally  well 
throughout  the  calculation  with  a  maximum  of 

3  percent  deviation  from  the  exact  displacement 
at  the  cavity  wall. 

5.  Neither  solution  deteriorates  significantly  at 
times  much  greater  than  that  shown  in  Fig.  3.1. 
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The  spherically  symmetric  elastic  explosion  calcula¬ 
tions  were  also  performed  using  a  two-dimension  axisymmetric 
grid  over  one  quadrant  in  order  to  test  for  deviations  from 
spherical  symmetry.  The  FD  calculations  were  performed 
using  the  2-D  Lagrangian  CRAM  code.  Five  decimal  places  of 
spherical  symmetry  were  achieved  in  the  FD  calculations. 

The  FE  DYNA  calculations  displayed  about  2  percent  variation 
from  true  spherical  symmetry  as  the  result  of  approximations 
m  terms  containing  1/r,  a  consequence  of  axisymmetric  geo¬ 
metry.  This  result  has  no  direct  implication  on  3-D  cal¬ 
culations,  since  terms  of  this  type  do  not  appear. 

3.2  LAMB'S  PROBLEM 

Certainly  one  of  the  most  challenging  problems  for 
testing  the  limitations  of  a  numerical  procedure  for  computing 
elastic  waves  is  Limb's  Problem,  Lamb  (1904),  Ewing,  et  al., 
(1957):  An  impulse  loading  applied  at  a  point  on  the  free 
surface  of  a  half-space.  The  exact  solution  involves  sharply 
peaked  wave  configurations.  A  significant  portion  of  the 
seismic  energy  is  channeled  along  the  free  surface  in  the 
form  of  a  Rayleigh  wave. 

The  solution  to  a  delta  function  loading  in  time  and 
space  applied  at  the  surface  of  the  half-space  serves  as  the 
Green's  function  for  obtaining  solutions  to  problems  with 
nonsingular  source  conditions.  Viecelli  (1971)  has  used 
this  approach  for  obtaining  smoothed  wave  forms  for  comparison 
with  numerical  results  obtained  using  the  2-D  FD  TENSOR  code. 
He  treated  the  free  surface  response  to  a  uniform  strip 
loading  (of  width  2a  =  8  cm  =  8  grid  dimensions)  applied 
normal  to  the  free  surface  with  the  time  history 

P(t)  * 
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2aiTT  [  1+  (t/r  )2] 
0  o 


(3.1) 


for  -oo  <  t  <  +°°,  see  Fig.  3.2.  The  sharpest  loading  history 

that  Viecelli  treated  was  for  t  =5  usee  (At  =  1  usee)  with 

.  o 

a  total  impulse  L  =  1.97  *  10  dyne-sec  per  cm  normal  to  the 
section  of  plane  strain.  Using  25,000  zones  in  the  2-D 
plane-strain  calculation,  Viecelli  was  able  to  reproduce  the 
Rayleigh  wave  displacements  at  the  free  surface  quite  well. 

Halda  and  Cherry  (1972)  employed  S3,s  2-D  FD  CRAM 
code  for  computing  the  Rayleigh  wave  described  in  Viecelli' s 
report.  The  following  parameters  were  used  in  the  calcula¬ 
tion  : 


P 


p  =  mass  density  =  2.77  g/cm3 

Vp  =  P-wave  velocity  =  5.55  x  10s  cm/sec 

Vs  =  S-wave  velocity  =  3.145  x  10s  cm/sec 

VR  =  Rayleigh  wave  velocity  =  2.898  x  10  *  cm/sec 

At  =  time  step  =  1  usee 

Ax  =  Ay  =  zone  size  =  1  cm 

2a  =  width  of  strip  load  =  8  cm 

(t)  =  surface  load  for  t  >  -20  usee  =  —  dyn/cm2 

1+  (t/5) 2 

+«o 

L  =  total  impulse  =  2a  J*  p(t)dt  =  1.97  x  101*  dyn-sec/cm 

-20 


Results  from  the  CRAM  calculation  are  compared  with  Viecelli's 
analytic  solution  along  the  free  surface  at  t  =  80  usee  in 
Fig.  3.3.  The  agreement  between  the  numerical  and  the 
analytical  displacement  of  the  free  surface  is  comparable 
to  that  achieved  by  Viecelli  using  the  TENSOR  code.  Inci- 
dently,  both  the  CRAM  and  the  TENSOR  code  use  the  cell 
centered  stress  differencing  scheme  presented  in  Section  2.2. 

It  was  our  intention  to  employ  a  FE  code  for  the 
Rayleigh  wave  computation  above.  However,  the  DYNA  code, 
referred  to  in  the  previous  section,  is  restricted  to  grids 
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Fig.  3.2--Surface  load  used  in  the  FE  and  FD 
calculations  for  Lamb's  problem. 
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3. ^--Vertical  displacement  of  the  free 
surface  at  t  =  80  usee. 
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that  fit  into  computer  core.  For  S3,s  UNIVAC  1108,  this 
corresponds  to  a  grid  18  zones  deep  and  25  zones  along  the 
free  surface  f S3  now  has  a  2-D,  3-D  dynamic  FE  code  that  is 
not  restricted  in  this  manner).  In  order  to  obtain  results 
at  80  yscc ,  the  grid  size  was  increased  to  Ax  =  Ay  =  1.555  cm 
and  the  strip  loading  was  applied  as  a  single  pulse  at  t  -  0, 
i.e. , 


2.22  x  10  9  [1  -  - Lll - dyn/cm2 

\  2.86x10  / 

for  -At  <  t  <  At,  see  Fig.  3.2.  The  value  At  =  2.86  ysec  is 
just  slightly  greater  than  the  time  required  for  a  P-wave  to 
cross  one  grid  dimension.  The  amplitude  of  the  strip  loading 
^max  ~  2*22  x  10  dyn/cm2  is  applied  over  one  grid  dimension 
to  each  side  of  the  plane  of  symmetry  (2a  =  2Ax  =  3.11  cm)  to 
give  a  total  impulse 

♦At 

L  =  2a  p(t)dt  =  1.97  x  io4  dyn-sec/cm 

-At 

The  vertical  displacement  of  the  free  surface  that 
results  from  the  FE  computations  at  80  ysec  is  also  presented 
in  Fig.  3.3.  Here  we  see  that  the  Rayleigh  wave  displacement 
for  the  FE  calculation  has  a  somewhat  sharper  wave  form  and 
higher  amplitude  peak  than  the  corresponding  FD  and  analytic 
results.  This  is  a  direct  result  of  the  difference  in  the 
two  source  characters,  illustrated  in  Fig.  3.2. 

The  displacements  and  velocities  of  points  throughout 
the  FE  grid  are  illustrated  in  Figs.  3.4  and  3.5.  Results 
are  presented  at  six  points  in  time:  t  =  24.9,  34.3,  45. 7 
57.2,  68.6,  and  80.0  ysec.  In  these  plots,  we  see  the  de¬ 
velopment  and  propagation  of  the  P-wave,  the  S-wave,  and  the 
Rayleigh  wave.  The  various  wave  phenomena  are  most  effectively 
36 
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Fig.  3.4b--FH  displacements 
loading  applied  to  the  free 

problem . 


Fig.  3.5a  —  I7I-  velocities  and  kinetic 
contours  generated  by  impulse  loadin 
to  the  free  surface,  2-P  Lamb's  prob 
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Fig.  3. 5b- -FE  vele:ities  and  kinetic  energy 
contours  generated  by  impulse  loading  applied 
to  the  free  surface,  2-D  Lamb’s  problem. 


depicted  in  Fig.  3.5,  which  contains  vectorial  plots  of  the 
velocity  field  sampled  at  the  node  points  and  contour  plots 
of  the  kinetic  energy.  These  techniques  for  presenting  wave 
phenomena  should  prove  valuable  for  depicting  waves  in  3-D 
geometry . 

3.3  5-D  SWIS  CALCULATIONS 

A  series  of  plane  wave  calculations  were  performed 
on  S  's  UNIVAC  1108  using  the  SWIS  code  (Stress  Waves  In 
Solids) .  These  serial  computer  calculations  were  directed 
toward  the  following  objectives: 

1.  To  test  the  utility  of  the  FE  technique  for  per¬ 
forming  3-D  wave  calculations,  particularly  waves 
that  propagate  a  stress  jump. 

2.  To  test  the  explicit  time  stepping  algorithm  and 
the  associated  artificial  damping  concept  pre¬ 
sented  in  Section  2,4  for  speed  and  accuracy. 

3.  To  test  the  nonreflecting  boundary  treatment  pre¬ 
sented  in  Section  2.5. 

We  consider  a  step  planar  loading  p  applied  to  the 
free  surface  (x^  =  0)  of  a  semiinfinite  continuum  at  t  =  0. 
Plane  waves  are  generated  that  propagate  in  the  x  direction 
with  a  discontinuity  in  partical  velocity  and  stress  at  the 
wave  front.  Using  small  displacement  theory  and  linear, 
homogeneous,  and  isotropic  continuum  properties,  the  propagat¬ 
ing  wave  motion  is  expressed  analytically  as 

Cct-Xi)  for  <  ct 
0  for  x  >  ct 
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and 


(  (T^m)  for  \i  ct 

u  =  < 

\  0  for  x  >  ct 

i 

where  c  =  Vp  (P-wave  velocity)  and  u  =  u  for  normal 
loading  (p  -  alJ)»  and  c  =  Vg  (S-wave  velocity)  and  u  =  u 

for  shear  loading  (p  =  -a  ).  2 

1  2 

The  planar  waves  are  modeled  numerically  using  a  string 
100  uniform,  cubic  elements  in  the  x  direction.  To  assure 
u2  =  U3  =  0  for  P-wave  motions,  the  boundary  nodes  (all  nodes 
for  this  element  configuration)  are  constrained  to  move  only 
in  the  x^  direction.  The  following  parameters  were  used  in 
the  FL  calculations: 

Vp  =  10 6  cm/ sec 

Vg  =  5  x  10  5cm/sec 

p  =  2.0  gin/cm3 

v  =  1/3 

A+  2y  =  2  x  io12  d> n/cm2 
P  =  108  dyn/cm2  =  100  bars 
Ax  =  Ax  =  Ax  =  1  cm 

1  2  3 

At  =  0.5  Msec  =  1/2  (Ax/Vp) 

From  this  numerical  experiment  we  find  that: 

1.  Spurious  high  frequency  (wave  length  less  than 
8  grid  dimensions)  motions  appear  in  the  numeri¬ 
cal  results  with  a  signal  to  noise  ratio  just 
behind  the  wave  front  of  about  3  for  particle 
velocity  when  no  artificial  damping  is  used, 
see  I ig.  3.6.  The  spurious  signal  does  not 
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Fig.  3 . 6- -Undamped  P-wave  at  t 


60  nsec. 


significantly  degrade  the  computed  particle 
displacements  using  At  =  j  ^  ,  also  illustrated 
in  Fig.  3.6.  P 

2.  The  discontinuity  in  particle  velocity  is 
suitably  modeled  numerically  using  a  damping  co¬ 
efficient  6/At  =  0.2,  illustrated  in  Fig.  3.7. 

Using  a  damping  coefficient  6/At  =  0.18,  the  wave 
front  is  propagated  through  the  grid,  which  is 
terminated  by  a  fiee  surface,  and  part  way  back 
to  the  source,  illustrated  in  Fig.  3.8.  From 
this  exercise  we  note  that  the  steep  wave  front 
does  not  appear  to  experience  excessive  deterio¬ 
ration  as  the  wave  propagates. 

3.  .iie  explicit  time  stepping  algorithm  developed 
in  Section  2.4  is  extremely  fast.  Waves  were 
propagated  through  the  404-node  chair,  of  100 
elements  at  the  rate  of  1/2  sec  of  UNIVAC  1108 
CPU  time  per  time  step.  Considering  the  dis¬ 
proportionately  large  number  of  constrained  dis¬ 
placement  components  in  this  test  problem,  we 
estimate  that  vaves  passing  through  large  3-D 
meshes  would  be  processed  at  only  about  1/5  this 
rate,  or  0.006  sec  per  node  per  time  step.  We 
consider  this  to  be  a  very  good  processing  rate. 

Alternate  boundary  conditions  were  tested  at  the 
surface  x^  =  100  cm.  A  rigid  boundary  results  in  a  reflected 
wave  with  zero  particle  velocity  behind  the  reflected  wave 
front.  The  SWIS  calculations  resulted  in  a  particle  velocity 
in  the  reflected  wave  five  orders  of  magnitude  below  the  in¬ 
coming  particle  velocity  of  50  cm/sec. 

The  non-re  plecting  boundary  condition  described  in 
Section  2.5  was  tested  at  the  boundary  surface  x  =  100  cm. 
Following  the  incidence  of  the  P-wave  with  the  non-reflecting 

boundary,  t  >  100  ysec,  the  computed  particle  velocity  remains 
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Fig.  3.7--P-wave  at  t  =  60  ysec  demonstrating  the 
influence  of  artificial  viscous  damping  on  the  com¬ 
puted  particle  velocity. 
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at  50  cm/sec  throughout  the  grid,  Fig.  3.9.  Actually  a 
small  ripple  in  the  particle  velocity  (less  than  1  cm/sec) 
was  reflected  from  the  boundary  back  into  the  grid  which  does 
not  show  up  on  the  plot. 

Calculations  were  also  performed  with  the  planar 

loading  p  applied  to  the  surface  x  =  0  in  the  direction 

i 

x2 •  This  load  configuration  generates  shear  waves  propagat¬ 
ing  in  the  direction.  Using  the  same  parameters  for  the 
S-wave  calculations  as  for  the  P-wave  calculations,  essen¬ 
tially  the  same  accuracy  and  speed  is  achieved  in  the  S-wave 
calculations  as  in  the  P-wave  calculations.  The  optimum 
damping  coefficient  (3/At  =  0.5)  was  found  to  be  somewhat 
greater  than  that  for  the  P-wave  calculations. 
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Distance  (cn) 


Fig.  3.9--P  wave  propagating  into  a  nonreflecting 
boundary,  danping  coefficient  B/At  =  0.18 


IV.  ILLIAC  CODE  FOR  3-D  ELASTIC  WAVES 


4'1  numerical  problem  definition  and  i-n  nmp  generate 

A  considerable  quantity  of  data  is  necessarily  re¬ 
quired  for  defining  arbitrarily  complex  3-D  grid  systems. 

In  order  to  remove  all  restrictions  from  the  node  configura¬ 
tion.  data  must  be  provided  for  individually  locating  each 
node  point  in  the  grid.  Since  arbitrary  node  configurations 
lead  to  nonsystematic  numbering  of  the  nodes  associated  with 
an  element,  connectivity  information  is  required  in  addition 
to  the  node  locations.  Also,  arbitrarily  inhomogeneous 
material  properties  will  need  to  be  specified  element  by 
element  throughout  the  grid.  In  all,  a  completely  arbitrary 
N-node  grid  will  require  about  17N  data  items.  Such  a  re¬ 
quirement  would  make  3-D  numerical  calculations  exceedingly 
tedious  to  set  up. 


The  massive  data  requirements  can  be  reduced  to  a 
manageable  level  by  generating  the  grid  data  in  regular 
ordered  portions  of  the  grid  and  providing  detailed  grid 
specifications  in  unordered  portions  of  the  grid.  The  more 
versatile  the  grid  generating  capability,  the  less  tedious 
the  data  preparation. 


In  preparation  for  processing  large  3-D  grid  configura¬ 
tions  on  the  Illiac,  a  technique  has  been  developed  for  auto¬ 
matic  3-D  grid  generation.  A  grid  of  skewed  hexahedral  ele¬ 
ments  is  mapped  into  simple  cubic  shapes,  all  elements  of 
identical  size.  The  grid  is  then  generated  for  the  cubic 

geometry  and  mapped  back  into  the  skewed  problem  geometry 
for  processing. 

The  versatility  of  this  technique  for  generating  3-D 
grids  depends  on  the  type  of  mappings  that  can  be  systematical¬ 
ly  performed.  At  this  time  we  have  developed  three  types 
of  mapping:  spherical  cartesian,  cylindrical  cartesian, 
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and  polynomial  cartesian.  In  each  of  these  three  map¬ 
pings,  the  zone  size  can  change  at  a  controlled  rate  in 
three  directions. 

Of  the  three  mappings  above,  the  polynomial  mapping 
is  the  most  versatile  for  generating  skewed  3-D  grid  con¬ 
figurations.  The  coordinates  of  twenty  points  around  a 
mass  of  continuum  are  used  to  define  the  mapping,  eight  of 
the  points  are  to  form  the  exterior  corners  in  the  cubic 
element  configuration  and  the  remaining  twelve  points  form 
the  mid  edge  points  in  the  cubic  configuration.  A  curved 
grid  is  generated  when  the  edge  points  do  not  lie  along  a 
straight  line  connecting  the  respective  corner  points.  The 
grid  spacing  is  reduced  in  the  vicinity  of  a  corner  point 

as  the  adjacent  edge  points  are  moved  in  the  direction  of  the 
corner  point. 


At  this  time  the  difference  equations  are  being 
generated  or  S3's  Univac  1108  using  the  3-D  FE  SWIS  code. 

The  influence  coefficients,  generated  in  this  manner,  be¬ 
come  data  for  the  ILLIAC  time  stepping  code.  This  procedure 
restricts  the  ILLIAC  code  to  computing  stress  waves  in 
materials  with  linear  stress-strain  laws,  since  no  provi¬ 
sion  is  made  for  changing  the  influence  coefficients 
during  the  time  stepping  calculations.  This  procedure  of 


supplying  the  parallel  processor  code  with  predetermined 
influence  coefficients  permits  the  linear  wave  propagation 
calculations  to  be  independent  of  the  grid  type.  However, 
the  transfer  of  large  numbers  of  influence  coefficients  to 
the  ILLIAC  site  for  parallel  processing  does  not  appear  to 


be  desirable  for  the  final  code  configuration. 


In  the  coming 


contract  year,  1973,  code  will  be  developed  for  generating 
difference  equations  on  the  ILLIAC  IV  computer. 
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While  early  versions  of  the  SWIS  code  will  require 
that  either  the  grid  be  fairly  regular  or  else  that  the  co¬ 
efficients  of  the  difference  equations  be  generated  ex¬ 
ternally  on  a  serial  machine,  good  progjess  has  been 
made  in  the  development  of  algorithms  for  generating  co¬ 
efficients  in  the  ILLIAC  for  the  rather  general  types  of 
grids  that  the  time  stepping  algorithm  is  equipped  to  handle. 

The  big  problem  is  not  the  generation  of  the  coeffic¬ 
ients  but  rather  rearranging  and  storing  them  where  the  time- 
stepper  expects  to  find  them.  In  order  to  be  more  specific, 
it  is  necessary  to  lave  a  detailed  specification  for  the 

storage  layout  for  the  variables  and  constants  that  occur 
in  Eq.  (2.18). 

Equation  (2.18)  has  the  form 

fW}  ■  *  (4HKt)  (4.1) 

where 

{~t}  =  {V  "  { — t  -  A  t }  +  2{{±t]  (4.2) 

TV’!mEt-!tVAt*(i-yutj  (4.3, 

[A]  =  -  At 2 [M] "  1  [ K ]  (4>4) 

and  ^t+At^  retains  its  previously  stated  significance. 

Equation  (4.1)  can  be  written 

-n  =  Xn  +Em  anm  wm  (4.5) 

wiiere  u_n,  vr,  and  wm  are  elements  of  the  vectors  {U  ), 

{-t}  and  {V  respectively  and  anm  is  an  element  of  the 
matrix  [A].  The  vector  elements  ur ,  vr  and  w  are  dis¬ 
placement  -  1  ike  vectors  (in  fact,  un  is  a  displacement,  that 
of  node  n  at  time  t  +  At)  and  they  are  represented  by  three 
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floating-point  numbers  wi 
inm  is  a  3  x  3  influence 
about  the  effect  of  node 
It  takes  nine  numbers  to 
PEM  words  to  store  those 


thin  the  computer.  Correspondingly 
matrix  tnat  contains  information 
m  on  the  displacement  of  node  n. 
represent  and  it  takes  nine 

numbers . 


I 


In  the  first  parallel  version  of 
is  provided  in  PEM  for  the  two  vectors 
are  calculated  at  the  beginning  of  each 
[A] ,  which  does  not  change  from  step  to 
compressed  form  on  the  disk,  from  which 
time  step. 


the  SWIS  code,  space 
{Vt)  and  {Wt}  which 
time  step.  The  matrix 
step,  is  stored  in 
it  is  read  once  each 


The  compression  of  the  A  matrix  is  achieved  by 
all  the  zero  elements.  The  identity  of  the  non-zero 
is  established  by  storing  the  subscript  pair  n,  m  a 
the  nine  floating  point  numbers  representing  a  ,  a 

that  adds  only  two  half-words  or  one  full  word  to"the 
required . 


omitting 
elements 
long  with 
strategy 
space 


To  appreciate  how  much  storage  is  involved,  consider  a 
problem  in  which  the  number  of  nodes  is  N  =  10,000.  Such  a 
value  of  N  may  be  too  small  to  give  a  good  representation 
of  many  three-dimensional  structures  but  we  are  limited  to 
such  a  value  of  N  by  the  storage  requirement  for  {V  }  and 

{-t}  whlch  are  stored  simultaneously  in  the  PEM,  and  each  of 
which  consumes  3N  words  of  space. 


In  a  rectangular  grid  interior  nodes  have  26  neighbors, 
not  counting  themselves,  so  rows  of  [A]  corresponding  to 
interior  nodes  will  have  only  27  non-zero  elements.  Rows 
corresponding  to  boundary  nodes  will  have  even  fewer.  Thus, 
in  each  row  the  fraction  of  matrix  elements  which  are  non-zero 
is  no  greater  than  27/N  -  0.0027.  The  total  space  required 
to  hold  the  compressed  A  matrix  is:  (number  of  nodes)  x 
(number  of  neighbors)  x  (number  of  words  of  storage  for  a 


matrix  element) 
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27  x  10  -  2.7  x  io6  words  when 


N  =  10**.  Thus,  it  takes  2.7  *  1 0 6 / 1  3  x  106  =  0.18  of  the  disk 
to  hold  [A]  when  N  =  10“ . 


In  all  that  follows,  if  m  is  a  positive  integer, 
let  p(m)  denote  the  remainder  resulting  from  the  division 
of  m  by  64.  Thus,  0  p(m)  <  64  and  64  divides  p(m)-m 
evenly.  Further,  let  PEp(m)  denote  PEr  where  r  =  p(m). 


The  notational  scheme  just  defined  provides  a  compact 

method  for  describing  storage  arrangements  in  ILLIAC.  In 

particular,  the  element  w^  of  {W^}  is  stored  in  PEp(m-l); 

i.e.,  w  is  stored  in  PE  ,  w  in  PE  ,  w  in  PE  and  w 

in  PhQ.  The  A  matrix  is  stored  so  that  a  can  be  read 

by  PEp(m-l),  i.e.,  the  PE  that  holds  w  ,  the  element  to 

— m 

be  multiplied  by  The  matrix  elements  are  further 

arranged  so  that  they  can  be  read  in  increasing  sequence  on 
the  first  node  number  n. 


4.3  SORTING  THE  A  MATRIX 
4.3.1  The  Sort  Problem 

It  would  be  inefficient  to  calculate  the  elements 
ln,m  of  the  matrix  [A]  defined  in  Eq.  (4.1)  in  the  order 
in  which  they  are  required  for  the  time  step  calculation. 
Therefore,  some  means  must  be  provided  for  arranging  them  on 
the  disk  in  their  proper  locations.  The  content  of  the 
present  section  is  a  method  for  laying  out  the  A  matrix  start¬ 
ing  with  a  file  of  non-zero  matrix  elements  and  their  associa¬ 
ted  subscript  pairs  in  arbitrary  sequence.  We  assume  that 
the  file  is  stored  on  the  disk  in  the  following  format; 

1.  Each  page  of  the  file  contains  64  16-word  records. 

2.  Each  record  contains  one  matrix  element  a 

_  „  =n,m 

consisting  of  9  numbers  and  the  indicies  n  and 

m  as  well. 
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3.  The  records  are  arranged  on  the  page  as  if  they 
had  been  written  as  16  64-word  rows,  4  records 
coming  from  each  row  and  the  16  words  of  each 
record  coming  from  16  consecutive  PE's. 

For  a  problem  of  N  =  lG1*  nodes,  as  described  in 
Section  4.2,  the  number  of  matrix  elements  will  be  27N  = 

2.7  x  105.  The  disk  will  hold  (300  pages/band)  x  (64  records/ 
page)  =  15,200  records/band  and  therefore  the  whole  compressed 
matrix  will  occupy  2.7  x  10 5 / 1 . 9  x  10“  =  14.2  hands.  This  is 
more  space  than  was  estimated  in  Section  4.2  because  here  we 

take  16  words  rather  than  10  words  for  each  non-zero  matrix 
element . 

In  the  final  configuration,  the  records  are  to  be 

arranged  on  the  pages  so  that  when  a  page  is  read  as  16 

64-word  rows,  one  record  will  enter  each  PE,  the  16  words  of 

the  record  falling  in  16  consecutive  rows.  Record  a  will 

=n,m 

enter  PEp(m-l),  i.e.,  PEp  will  receive  columns  64*S,+p  +  l  of  the 
matrix  [A] ,  where  0  <  l  <  N/64  and  N  is  the  order  of  [A]  . 
Finally,  the  records  are  to  be  sorted  in  row-order,  i.e.,  if 
ni  <  r*2  anc*  =  P(m2)  then  a  is  located  ahead 

°f  «in2  mz  •  There  are  usually  27  non-zero  elements  per  row 
so  that  a  given  PE  will  receive  elements  from  fewer  than  half 
the  rows.  On  the  other  hand,  row  n  could  have  non-zero 
elements  in  columns  m  and  m  +  64  in  which  case  PEp(m) 
would  receive  at  least  two  elements  of  row  n. 

4.3.2  Brief  Description  of  the  Procedures 


Having  completed  these  preliminaries,  we  can  describe 
the  strategy  underlying  the  sorting  procedure.  The  first 
step  is  to  read  the  unsorted  file,  which  we  denote  as  F 
and  rewrite  it  in  four  new  files 
new  file 

16 j  £  p(m)  <  1 6 j  +  16,  0  <  j  <4. 


F  .  will  contain  those  elements 
i  *3 

<  16  j 


o  *  o 

where  0  5  j  <  4.  The 
for  which 


=n,m 

Step  2  consists  in 
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copying  each  file  F 


i » J 


into  four  shorter  files 


F  .  ,  and  F  .  _ 

2  »  4j  +2  2  ,  4j  +3 

will  contain  those  elements 


for  all  j  ,  0  <  j  <  4.  File 


F  .  F 

2  >  4  j  '  2  ,  4  j  +1 ' 


2  ,  1 


»n,m  for  which  4i  1  p(m)  <  4i+4. 


Step  3  consists  of  copying  each  file  F  .  into  four  still 

where  0  <  k  <  4  for  all  j,  0  <  j  <  16. 

a  for  which 


shorter  files  F  .  , 

3  , 4  J  +k 

Fiie  F  will  contain  all  those  elements 

p(m)  =  p! 


=n  ,m 


In  step  4,  each  file  F^p  from  step  3  is  read  into 
the  PEM,  sorted  on  row  index  n  and  written  out  on  file  F 
0  <  p  <  64.  We  assume  that  all  files  F  have  no  more  than 

(1024  rows)  x  (4  matrix  elements/row)  =  4096  matrix  elements. 
Thus,  the  matrix  can  have  at  most  4096  x  64  =  218  non-zero 
elements.  It  follows,  also,  that  F  and  F  have  at 
most  64  pages  each  so  they  will  each’fit  in  two'strips. 

The  last  step,  step  5,  is  unlike  the  preceding  ones. 

It  begins  by  reading  the  first  page  of  each  of  the  64  files 
P4,p  and  writing  their  contents  at  the  first  64  pages  of  the 
sorted  file  F^  Between  reading  and  writing  there  occurs  a 
certain  transposition  of  the  data.  Matrix  elements  from  file 
F4,p  fal1  int0  garter-rows  when  they  are  read  but  they  must 
be  rearranged  so  that  they  occupy  16  consecutive  rows  of  PEp 
when  they  are  written  on  file  F^  A  way  of  performing  the 
transposition  is  described  in  some  detail  in  the  paragraphs 
below.  After  the  first  64  pages  of  F  are  written,  the 
second  page  of  each  F  is  read,  for  all  p,  and  these 
data  are  transposed  and  written  as  pages  64  through  127  of  F 
and  so  on  until  F^  contains  all  non-zero  elements  of  the 
matrix  A. 


4.3.3  A  Lower  Bound  on  the  Time  Required 

It  is  already  possible,  on  the  basis  of  this  preliminary 
description,  to  estimate  the  I/O  time  ^jq  required  to  per¬ 
form  all  the  data  transfers  between  disk  and  PEM.  It  would  be 
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of  some  interest  to  have  a  formula  for  Tta  in  terms  of  the 


order  N 
file  F 
2  1 2 


_  ,  10 
of  the  matrix  A  and  the  length  L 


of  the  input 

o,o  ^Ut  f°r  t^e  present  we  will  assume  that  F  has 
pages  and  that  no  file  F ^  will  have  more  than  27’pages. 
In  addition,  we  assume  just  one  ECU  operates  at  a  time  so  that 
reading  and  writing  cannot  be  overlapped. 


Under  these  assumptions,  the  first  three  steps  each 
require  that  21Z  pages  be  read  and  Written,  so  step  s  will 
take  time  T  =  213T  where  1  <  s  <  3  and  T  is  the  time 

s  p  —  P 

required  to  transfer  a  page. 


For  step  4,  assume  that  each  F  can  be  read  in  one 

,  j  3  p  P 

rotation  and  that  F  can  be  written  in  one  rotation. 

4  9  P 

Then  step  4  takes  =  27Tr  where  is  the  rotation  time. 

On  step  5,  assume  that  it  takes  one  rotation  to  read 

the  k  page  of  each  file  F^  p  and  one  rotation  to  write 

the  k—  segment  of  F  ,  for  0*<  k  <  26.  Then  T  =  27T 

5  —  5  r 

Using  T  =  133  ys  and  T  =  40  ms,  we  get 


Vs  -  3  x  213  Tp  +  2  x  27  Tr  =  21.9  sec 

The  five  steps  outlined  above  fall  into  three  phases. 
Phase  A  consists  in  separating  the  records  into  64  bins, 
which  are  the  files  F  .  Phase  A  encompasses  steps  1.  2. 
and  3.  The  records  in  each  bin  are  arranged  in  ascending 
sequence  in  step  4  which  is  Phase  B.  Then  the  ordered  lines 
are  merged  in  step  5  which  is  Phase  C.  This  3-phase  strategy 
has  been  employed  to  sort  files  of  medium  to  large  sizes  on 
serial  computers. 

4.3.4  Phase  A  -  Segmenting  the  File  into  Bins 

Next,  let  us  turn  our  attention  to  the  data  flow  within 
the  PE's.  The  three  steps  of  Phase  A  are  similar:  The  records 
in  one  file  are  reclassified  into  four  others. 
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In  the  procedure  for  Phase  A,  the  PE's  operate  in  four 

groups  of  lb,  group  g  consisting  of  PE(16g+h)  where  0  <  h  <  16 
and  0  <  g  <  4.  The  buffer  area  occupies  3  x  64  rows.  This 
space  is  organized  into  12  1-page  buffers,  3  in  each  group 
of  PE's. 

The  proposed  procedure  for  separating  the  records  into 
their  proper  bins  is  straightforward.  Register  $R  will,  at 
each  cycle,  hold  as  many  as  four  records,  one  in  each  bin. 
Missing  records  are  replaced  by  zeros.  At  the  beginning  of  a 
cycle,  each  bin  examines  the  contents  of  $R  to  see  if  the 
record  there  belongs  in  that  bin.  If  so,  that  record  is  re¬ 
placed  by  one  that  belongs  in  another  bin,  otherwise  the 
record  is  left  in  $R.  Then  the  cycle  is  completed  by  routing 
$R  16  words  to  the  right.  (If  a  record  is  transferred  from 

$R  to  a  buffer  by  some  bin  we  say  that  the  bin  received  the 
recoid  and  if  the  record  was  transferred  from  the  buffer  to 
$R  we  say  it  was  sent  from  the  bin.) 

The  attentive  reader  will  have  thought  of  some  minor 
complications  that  are  glossed  over  in  the  paragraph  above. 

A  few  of  them  are  addressed  in  the  following  list: 

1.  If  a  bin  can  receive  the  record  it  finds  in 
$R  but  has  nothing  ready  to  send,  it  sends  a 
dummy  record  of  all  zeros. 

2.  If  a  bin  finds  zeros  in  $R  and  has  a  record 
to  send  it  simply  sends  without  receiving. 

3.  A  bin  that  lias  filled  all  its  buffers  with 
records  belonging  there  just  ignores  $R  until 

one  of  its  buffers  has  been  transmitted  to  the 
disk. 

Each  step  is  begun  by  reading  eight  pages,  two  per 
bin,  and  proceeds  to  alternate  writing  and  reading  until  all 
pages  of  the  input  file  have  been  read.  After  this,  there 


will  be  about  eight  pages  remaining  in  the  PEM  to  be  binned 
and  written  out.  The  sequence  of  the  data  will  have  to 
determine  the  order  in  which  the  four  output  files  are  written 

ive  propose  to  keep  a  map  of  the  files  in  which  each 
page  represented  by  a  bit  which  is  0  or  1  corresponding  to 
whether  it  is  the  image  of  a  vacant  page  or  an  occupied  page. 
Ivhen  the  input  map  is  all  zeros,  input  is  complete  and  when 
output  is  complete,  zeros  remaining  in  the  output  maps 
correspond  to  unused  pages  in  the  output  files. 

We  do  not  have  much  to  say  about  timing  but  it  is 
possible  to  estimate  some  upper  bounds.  Note  first  that  the 
rate  at  which  records  fall  into  bins  must  average  at  least  4 
per  3  cycles  because  4  records  are  being  moved  at  a  time  and 
it  takes  at  most  3  cycles  for  a  record  to  find  its  way  home. 
The  fact  that  $R  might  have  a  dummy  record  means  either  that 
some  record  was  read  into  the  bin  where  it  belongs  or  that 
some  bin  has  processed  all  its  buffers.  In  the  first  case, 
a  record  reaches  the  output  buffer  in  one  cycle  and  the  latter 
case  arises  only  when  processing  gets  ahead  of  I/O,  which  is 
of  no  concern.  Neither  case  demands  that  we  revise  our  pes¬ 
simistic  estimate  of  3  cycles  to  process  4  records.  There¬ 
fore  3/4  x  64  -  48  cycles  per  page  is  the  upper  limit  on  the 
amount  of  processing  required.  To  keep  up  with  I/O,  the  cycle 
time  would  have  to  be  (266  us/page)  (16  clocks/ps) / (48  cycles/ 
page)  =  88  clocks/cycle  or  about  40  FINST  instructions  since 
no  floating  point  is  involved.  It  is  hard  to  imagine  using 
more  than  200  instructions  per  cycle,  which  yields  an  upper 
bound  of  (3  Steps)  x  (2.18  sec  for  I/O)  x  (5  processing  time/ 
I/O  time)  =  60  sec  for  «teps  1,  2,  and  3. 
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4.3.5  Phase  B  —  The  Internal  Sort 


Step  4,  which  is  the  only  step  in  Phase  B,  has  64 
sub-steps,  one  for  each  PE.  On  suj-step  4  ,  file  F  is 

.  .  .  P  3  » P 

read  in  its  entirety  into  the  PEM’s,  sorted  and  written  on 
file  F  . 

4  ,  P 

The  sorting  procedure  is  patterned  after  an  algorithm 
called  quicksort  (Hoare,  1961).  The  procedure  begins  with 
the  choice  of  a  key  called  the  pivot  which  is  selected  so 
that  it  will  fall  near  the  middle  of  the  file  after  it  is 
sorted.  In  our  case,  the  keys  have  numeric  significance  and 
they  are  fairly  uniformly  distributed  between  0  and  the  maxi¬ 
mum  so  we  would  simply  take  one  half  of  the  maximum  as  the 
value  of  the  pivot  first  key.  Sampling  techniques  are  used 
in  more  general  cases. 

In  either  case,  records  at  the  beginning  of  the  file 
with  keys  that  exceed  the  pivot  are  interchanged  with  records 
near  the  end  of  the  file  with  keys  that  do  not  exceed  the 
pivot.  By  working  from  both  ends  toward  the  middle,  the 
file  is  subdivided  into  two  subfiles,  the  first  of  which 
consists  of  records  with  keys  that  are  less  than  or  equal  to 
the  pivot  and  the  second  of  which  falls  behind  the  first  and 
consists  of  records  with  keys  that  are  greater  than  the  pivot. 

The  same  procedure,  called  partitioning,  is  applied  to 
the  first  subfile  to  produce  two  more  subfiles,  and  then  the 
first  of  those  is  partitioned,  and  so  forth,  until  the  first  sub¬ 
file  consists  of  just  one  record,  the  first.  At  this  point,  there 
will  be  about  log^M  subfiles,  where  M  is  the  original  file 
length.  The  next  stage  is  not  unlike  the  preceding  onesj 
the  first  subfile  that  has  length  n  >  1  is  partitioned  re¬ 
peatedly,  occasionally  producing  a  one-record  subfile  at  the 
front  which  is  just  appended  to  the  ones  ahead  of  it.  The 
strings  of  records  thus  produced  are  in  ascending  sequence 
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and  eventually  constitute  the  entire  file  at  which  point  the 
sort  is  complete. 

The  proposed  method  for  implementing  quicksort  on 
ILLIAC  IV  begins  by  loading  F  into  the  PEM.  Recall  that 

F3,p  has  at  most  212  records  and  will  occupy  no  more  than  210 
rows,  or,  equivalently,  26  pages.  The  four  bins  of  up  to  210 

records  are  first  partitioned  independently  within  each  bin. 
One  might  well  find  it  advantageous  to  use  special  tricks  for 
improving  PE  utilization,  but  the  method  for  partitioning  in 
each  bin  could  be  very  similar  to  that  used  in  serial  com¬ 
puters.  Having  the  full  20  bit  key  stored  in  each  word  makes 
it  possible  for  the  16  PE's  in  each  bin  to  work  as  a  unit. 
Between  partitionings,  records  would  be  routed  between  bins 
so  that  there  would  be  one  pivot  row  in  the  buffer  with  the 
property  that  all  records  in  lower  numbered  rows  would  have 
keys  less  than  the  pivot  key  and  all  records  in  higher  num¬ 
bered  rows  would  have  keys  greater  than  the  pivot  key.  Re¬ 
cords  on  a  pivot  row  could  have  keys  falling  on  either  side 
of  the  pivot  key. 

Having  partitioned  the  file  once,  the  subfile  of  rows 
numbered  lower  than  the  pivot  row  would  be  partitioned  once 
again.  The  first  pivot  row  would  be  included  in  the  second 
partitioning  because  it  can  hold  low  keys.  The  second 
partitioning  would  again  be  followed  by  routing  to  produce 
a  second  pivot  row.  As  in  the  serial  procedure,  the  first 
subfile,  i.e.,  the  one  with  the  lowest  keys,  would  be 
repeatedly  partitioned  until  the  first  row  cf  the  buffer  be¬ 
comes  a  pivot  row.  The  partitioning  process  is  continued 
further  until  every  row  in  the  buffer  becomes  a  pivot  row. 

Ac  every  stage,  partitioning  is  performed  on  the  first 
subfile  consisting  of  a  set  of  consecutive  rows  in  the  se¬ 
quence:  pivot  row,  one  or  more  non-pivot  row,  pivot  row. 

The  number  of  such  subfiles  at  any  given  point  in  the  piocess 
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Will  seldom  exceed  10  if  there  are  210  rows  in  the  buffer. 

Thus,  little  space  is  required  to  keep  t-ack  of  the  pivot 
rows . 

The  file  will  rarely  be  in  order  after  being  parti¬ 
tioned  as  described  above  but  permuted  elements  will  always 
be  on  consecutive  rows  so  they  cannot  be  separated  by  more 
than  six  intervening  records.  The  remaining  unraveling  is 
left  to  step  5  along  with  the  merging.  The  pages  of  file 
Fm,P  consist  of  records  from  16  consecutive  full-length  rows 


4*3.6  Phase  C  -  Reassembling  the  Pat 


Step  5  is 
the  64  files  and 
from  file  F 

4,P 

coding  tor  this 
pulating  one  row 


started  by  reading  the  first  page  of  each  of 
rearranging  the  records  so  that  those  coming 
fall  in  PHp,  0  <  p  <  64.  Supposing  that  the 
operation  contains  100  instructions  for  mani- 
of  information,  the  time  required  for  the 


rearranging  would  be  (100  instructions/row)  x  (1024  rows/ 
buffer  load)  x  (2  clocks/instruction)  r  (16  clocks/us)  = 

12.8  x  io3  ys. 


There  still  remains  a  little  sorting  to  do  because  the 
quicksort  of  step  4  left  some  records  as  much  as  six  positions 
away  from  their  final  positions.  A  variant  of  the  classical 
binary  merge  procedure  will  work  well  here  and  only  four  re¬ 
cords,  64  rows,  of  extra  storage  are  required.  The  first  two 
steps  of  the  binary  merge  would  be  carried  out  exactly  as  for 
a  full  sort  to  produce  sorted  sequences  of  4  records.  The 
variation  comes  in  the  third  and  final  step.  It  begins 


normally  with  a  merge  of  the  first  two 
duce  a  string  of  8,  but  then  the  last 
of  8  are  merged  with  the  third  string 


strings  of  4  to  pro- 
4  records  of  the  string 
of  4  to  produce  another 


string  of  8.  The  first  half  of  the 
is  stored  just  behind  the  first  half 
8  and  the  second  half  is  then  merge d 


preceding  string  of  8 
of  the  first  string  of 
with  the  next  string. 
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The  binary  merge  will  take  less  time  than  the  13  ms 
it  took  to  get  records  in  their  proper  PE's  so  the  whole 
processing  time  for  step  5  should  be  less  than  25  ms. 

4.4  TIME  STEPPING 

4.4.1  Algorithm 

A  good  design  for  stepping  numerical  stress  waves 
along  in  time  is  most  important  for  the  3-D  SWIS  code,  since 
it  is  this  process  that  will  account  for  essentially  all  of 
the  execution  time.  The  time  stepping  process  being  described 
is  the  repeated  calculation  of  Eq.  (2.18)  taking  advantage  of 
the  parallel  processing  capabilities  of  the  ILLIAC  IV. 

The  first  three  terms  of  Eq.  (2.18)  involve  column 
vector  operations  which  are  relatively  minor.  The  matrix  [M] , 
containing  the  nodal  masses,  is  diagonal  and  thus  trivial  to 
invert.  All  these  operations  are  easily  adapted  to  parallel 
processing.  The  terms  involving  0  are  also  easily  computed. 
The  column  vector  resulting  from  this  operation  is  multiplied 
by  the  very  large  banded  sparse  matrix  [A]  (which  is  defined 
in  Eq.  (4.4)).  This  multiplication  by  [A]  will  account  for 
nearly  all  the  execution  time  that  is  required  to  complete 
one  numerical  time  step. 

The  following  scheme  for  banded  sparse  matrix  multi¬ 
plies  was  developed  by  Frazier  (1972).  Since  the  matrix 
is  of  order  3  x  104,  it  is  necessary  to  compress  it  to  a 
manageable  size.  We  note  that  in  a  3-D  gridwork  of  bricks 
most  nodes  have  just  26  immediate  neighbors;  consequently/ 
for  each  component  (i,j)  there  will  usually  be  27  non-zero 
terms  in  a  single  row  of  [A].  The  unnecessary  zeros  are 
compressed  out  of  the  rows  of  [A]  to  yield  a  matrix  N  by 
27  (N  being  the  total  number  of  nodes  in  the  grid).  From  the 
node  numbering  sequence  we  can  deduce  the  column  numbers  for 
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N 


the  non-zero  terms  in  each  row,  i.e., 


m 


n 


k 


1,2, 

1,2, 


27 


(4.6) 


where  n  and  m  are  the  row  and  column  numbers  of  [K] , 
respectively,  and  N  is  the  total  number  of  node  points. 
The  array  of  contributing  column  numbers  mn  k  *  1,2, 

~  2.  are  simply  the  node  numbers  adjacent  tc  node  n. 

Only  the  non-zero  multiplications  are  performed  in 
the  sparse  matrix  multiplication  which  is  expressed  by 


for  n  =  1,2 , 


■27 

~n  =  £  Vk  (V 

k*=l  n»K 


N  with 


(4.7) 


a 


*=n ,  m 


n  ,k 


K 


=n,m  , 

*  n,k 


(4.8) 


The  sparse  matrix  multiplication  is  performed  at  each 
time  step  with  the  sparse  matrix  remaining  unchanged  for  the 
special  case  of  linear  wave  simulation;  consequently  con¬ 
siderable  effort  can  be  devoted  to  arranging  the  non-zero  terms 
of  the  sparse  matrix  on  the  mass  storage  unit  in  an  optimal 
fashion  for  processing.  The  compressed  matrix  [A]  should  be 
arranged  on  the  disk  so  that  each  term  arrives  in  the  proces¬ 
sor  containing  the  nodal  displacement  fo1  which  it  is  to  be 
multiplied,  Eq.  (4.7).  Thus,  a^  ^  should  arrive  in  the 
PEM  containing  u^  k(ta)  without ’requiring  additional  shift 
operations.  The  re6rdering  of  [A]  is  done  on  a  serial 
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machine  independent  of  the  ILLIAC  IV  in  the  first  implementa¬ 
tion  of  SWIS.  For  larger  problems,  the  previous  section  de¬ 
scribes  how  the  reordering  can  be  performed  by  the  ILLIAC  IV. 
The  following  discussion  defines  the  sparse  matrix  multiply 
process  and  the  flow  of  information  from  disk  memory  to  the 
array  in  detail. 


The  nodal  displacements  u 

— m 


are  vectors  of  three  com¬ 
ponents  and  require  three  numbers  for  their  representation. 
Let  uim  be  the  component  of  u^  along  the  x.  axis, 
i  =  1,2,3.  Correspondingly,  a  .  is  a  3  x  3  matrix  that 


Correspondingly,  an  ^ 
requires  nine  numbers  in  its  representation.  Let  a\ 
denote  the  ij  element  of  this  matrix.  Then  Lq.  (4.7)  can 
be  written 


~27  3 

Xi  2^ain,jk  uj 

k-1  j-i 


m 


n ,  k 


u 
—1 3 


(4.7") 


where  b-n  is  the  component  of  bn  along  the  x.  axis. 

The  nodal  displacements  at  ta  are  arranged  on  the 
disk  to  flow  inco  the  PEM's  (denoted  p)  by  PE  rows  (denoted 
r)  so  that  Uii'>p  =  0,r  =  0;u  +  p  =  0 ,  r  =  1 ; 

'*  *  P  "  0.  r  -  2;  u  -  p  =  1,  r*  =  0;  ...  u  -  p  =  63, 

„  ^  _  n2l_  _  .  “(64)3 


r  =  2;  u,  , 

—  (  6  5  J  3 


p  -  0,  r  =  3;  etc . 
In  general  we  have 


u. 

jm 


P(m-l)  =  (m-1)  -  64 

64 


r  =  3 


m- 1 


j-1 


(4.9) 


where 


m-1 


W 

mainder  of 


m  .  ™  U1  Vls lon  and  p(m)  is  the  re 

5T|  as  in  Section  4.2.  This  storage  configura¬ 
tion  in  the  PE's  is  achieved  by  loading  u-  (t  ),  m  =  1,2, 

...  N,  on  the  disk  in  the  sequential  order  U  a  S  =  1  7 
where  s’  99 

64 


3N 


s  =  m  +  128 


m-  1 

vr 


64 ( j - 1 ) 


(4.10) 


+ 


j  =  1,2,3  . 


The 

containing 


condition  for  ain  to  arrive 
u-  (t  ) 

Jmn,h  a  1S  cxpressed,  using 


ain,jk  *  P  (“n.k’1' 


in  the  processor 
Eq.  ,4.9),  as 


(4.11) 


I  he  Ph  row  number  r  to  be  occupied  by  the  various  non¬ 
zero  terms  of  the  sparse  matrix  is  somewhat  more  difficult  to 
expiess  because  of  the  arbitrariness  of  the  node  numbering 
scheme.  The  node  numbers  n  should  increase  monotonically 
(but  not  necessarily  sequentially)  with  increasing  row  number 
r  in  each  PI:  so  that  the  row  number  n  of  the  sparse  matrix 
can  be  processed  in  ascending  order.  However,  in  general,  no 

more  than  27  of  the  64  Pb's  will  contain  a  u.  (t  )  for  which 
—  •  9  mn  k  ^ 

ain,jk  1S  non-zero  for  any  particular  matrix  row  number  n. 

That  is,  only  about  one-third  of  the  PE's  will  contain  nodal 
displacements  that  are  adjacent  to  node  number  n  in  the 
spatially  zoned  continuum.  Furthermore,  a  single  PE  may,  in 
some  instances,  contain  more  than  one  neighbor  nodal  displace¬ 
ment  but  rarely  more  than  nine. 


When  performing  the  multiplications  that  contribute  to 
matrix  row  number  n,  there  is  no  need  to  make  the  noncontri¬ 
buting  PE's  inoperative.  Each  nonccnt ribut ing  PE  can  simul¬ 
taneously  perform  multiplications  for  the  next  higher  matrix 
row  number  for  which  the  PE  will  have  a  contribution.  If  the 
compressed  matrix  [A]  is  loaded  into  the  PE's  in  the  proper 
sequence,  this  work-ahead  scheme  can  be  carried  out  by  perform¬ 
ing  multiplications  in  each  PE  in  the  sequence  that  the  [A] 
terms  are  loaded  from  mass  storage.  This  desired  storage  con¬ 
figuration  in  the  various  PE's  is  achieved  by  loading  a.  .j.; 
n  -  1,2,  ...  N;  k  =  1,2  ...  »27,  on  the  disk  in  the  sequential 

6S 


order  ag ,  S  =  1,2 ,  ...  s  10  x  27  x  N  where 

S  =  64r  +  (p+1)  +  64(3i+j-4)  ,  (4.12) 

i  and  j  =1,2,3  , 


in  which  r  =  0,  1,2  ...  s  10  x  27  x  N/64  and  p  =  0,  1,2  ...  63 
are  the  row  and  PE  number,  respectively.  The  PE  number  is 

p^mn,k’ *  The  r0w  number  each  PE  is  expressed  by  summing 
all  previous  entries  in  the  particular  PE,  i.e.,  the  row  num¬ 
ber  for  PEp  is  expressed  in  terms  of  n  and  k  by 


where  6pp.  =0  for  p  f  p'  and  5pp.  =1  for  p  =  p'  and 
where,  just  as  in  Eq.  (4.11) 


p'=P(V,k'~‘)  •  (4.14) 

Every  10th  term  in  the  array  starting  with  S  =  1  is  used 

to  store  two  index  numbers:  m  =  m  .  to  identify  the  nodal 

II  p  l\ 

displacement  that  is  to  be  multiplied  and  n  to  identify  the 
matrix  row  to  which  the  multiplication  contributes,  Eq.  (4.7). 

Using  the  storage  schemes  defined  above,  i.e.,  starting 
from  a  point  in  the  computations  in  which  (U  (t  )}  and  [A] 
appear  in  their  prescribed  sequences  on  the  disk,  the  sparse 
matrix  multiplication  of  Eq.  (4.7)  proceeds  as  follows: 

i •  (U  (t^) }  is  loaded  into  core  by  PE  rows  using  a 
single  access. 

2.  The  serially  arranged  version  of  [A],  defined 

Eqs.  (4.11)  -  (4,14),  is  accessed  and  its  loading 
is  initiated.  The  terms  flow  into  core  lv  PE 
rows  starting  with  row  0.  After  30  rows  are 
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filled  the  loading  is  continued,  uninterrupted, 
back  at  row  0  overwriting  the  previously  loaded 
terms.  The  computations  and  manipulations  of  the 
following  steps  are  carried  out  before  the  matrix 
terms  are  overwritten. 


3.  Initialize  r 


4. 


10;  r2  =  0;  br  =  0  for 


r  ■  0,  1,2,  ...  99;  no  =  1. 

Three  sets  of  three  multiplications  and  product 
summation  are  performed  simultaneously  in  all 
processors . 


b  .  =  b  .  + 
Tz  + 1  r2  + 1 


j=l 


ro+j+3i-3  ri+j-l;  1  =  1*2*3 


where 

r  =  (r  +10)  / 1 - 6  ) 

0  o  \  2  o ,  r  / 


m-1 

~5T~ 


JV  lf  n  ■  br; 

I  r  +4  ,  i  f  n  >  b 

O  9  ' I 


n  =  a^  (first  32  bits) 
o 


m  =  ar  (second  32  bits) 


Also,  store  matrix  row  number  of  the  product  contribu¬ 
tion 


2 

Set  n  =  n  +  1 
0  0 


*7 


Check  for  the  completion  of  matrix  row  number  n 

If  nQ  =  MIN (n)  return  to  step  (4);  otherwise,  ° 

i.e.,  n^  <■  MIN(n))  continue  on  to  step  (6). 

MIN(n)  is  the  minimum  b  among  all  64  PE's. 

0 

Perform  a  row  sum  on  b.;  i  =  1,2,3  among  those 

PE  s  for  which  b^  =  n  .  With  only  PEp  operative, 

shift  the  result  to  b°  +  ,  i  =  1,2,3  where 

i 

P  =  p(n  -1) 

0 


Operating  only  those  PE's  for  which  b  =  n  set 
br  =  br+4  for  r  “  0,  1,  . ..  99  . 

If  the  next  ten  PE  rows  of  the  [A]  matrix  have 
been  loaded,  return  to  stop  (4);  otherwise  return 
to  step  (5)  . 

The  computations  and  manipulations  of  steps  (4)  -  (7) 
are  displayed  in  Table  II. 

The  time  stepping  scheme  without  the  damping  terms  in¬ 
volving  6  has  been  implemented  in  GLYPNIR  and  tested  on 
the  B6700  simulator.  Chapter  V  contains  a  description  of  the 
test  problem  and  results.  The  following  section  discusses 
schemes  for  outputting  and  representing  the  results  from  the 
Illiac  SWIS  code. 

4.4.2  Ideal  Timing 

Since  the  sparse  matrix  multiply  used  in  the  time 
stepping  scheme  is  time  consuming,  some  consideration  is 
given  to  performance  of  the  SWIS  code  with  parts  of  the  logic 
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TABLE  II 

COMPUTATIONS  AND  MANIPULATIONS  OF  STEPS  (4)  -  (7) 

1  =  i*2*3!  n  =  1*2,3  ...  N 

u" 


pr» 

I 


u. 


♦ 


pr 


P  =  0,1,2,  ...  63 
=  (S-l)  -  64 


S-l 

5T” 


=  p(m-l) 


r  =  R  ,  R  +1,  R  +1,  ...  R  +3 
0  0  0  o 


N+63 

~TT 


R  + 

o  64  1 


R  +3 
o 


m- 1 
64 


+  i-1 


S  =  1,2,3,  ...  *  3N 
m- 1 


ij, jm» 


=  m+128  |  +  64  C i - ID 

(n.m)  =  1,2,3,  ...  N;  i  =  1,2,3;  j  =  1,2,3; 


in, jk*  n  1*2,3  ...  N; 


k  =  1,2,3,  ...  27 

m  =  m  ,  data 
n ,  k 


r  = 


-  R  ,  R  +1,  R  +2,  . . .  s  R  +  10  x  27  x  ! 
111  i  6 


R 


n-1 

=27  k-1 

vioZ 

n'  =  l 

£  v*10  £ 

k  *1  ”i 

0,1,2  ...  63 

P(mn,k’^ 

p'  =  P^n'.k'-1) 


as  S  =  1,2,3,  ...  = 


=  64r  +  (p+1)  +  64(3i+j-4) 


69 


which  have  been  coded  in  ASK.  A  preferable  manner  to  imple¬ 
ment  SWIS  would  be  to  code  the  outer  parts  of  the  program  in 
GLYPNIR  for  ease  in  maintenance  or  modification.  The  critical 
parts  of  the  sparse  matrix  multiply  in  the  time  stepping 
section  would  be  coded  in  ASK  to  speed  up  program  execution. 
This  discussion  will  compare  estimated  timings  of  various 
logical  section:,  of  SWIS  written  in  ASK  with  the  simulator 
generated  timing  for  the  same  section  coded  in  GLYPNIR. 

GLYPNIR  timing  for  three  logical  steps  in  the  code  were 
given  in  Section  5.2.  The  only  logical  step  which  consumed 
a  non-trivial  quantity  of  time  was  step  3,  involving  the 
partial  matrix  sums  and  rowsumming. 

The  time  through  step  3-a  in  GLYPNIR  was  44  ys.  The 
time  estimate  for  the  same  logic  in  ASK  is  roughly  11  yS, 
giving  a  factor  4  advantage.  The  GLYPNIR  time  for  a  rowsum 
was  92  ys,  while  the  estimate  for  the  ASK  version  is  40  ys. 

following  the  computations  given  in  Section  5.2  for  a  104  node 
problem: 

1.  Partial  sum  time 

3  X  11  y s/block  =  33  ys/block 

2.  7  nodes/block  x  40  ys/node  =  280  ys/block 

3.  Total: 

(33ys+280ys)/block  x  104nodes  „  ,  , 

1  nodes/block  -  =  °*44  sec/time  step 

Thus  it  appears  possible  to  roughly  double  the  execution  speed 
of  the  code  by  programming  in  ASK. 

4.5  COMPUTER  RESULTS 

If  we  are  to  take  full  advantage  of  each  seismic  cal¬ 
culation,  large  quantities  of  digital  results  must  be  stored 
for  post  ILLIAC  processing  and  displaying.  Again  alluding 
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to  our  reference  problem  involving  a  lO'-node  grid,  we  note 
that  five  time  intervals  of  full  grid  output,  nodal  displace¬ 
ments  and  velocities,  involves  3  x  10s  words  of  digital  data 
or  about  10'  storage  bits.  Transporting  these  quantities  of 
data  through  the  ARPA  net  at  S  x  lo"  bits/sec,  assuming  100 
percent  occupancy  of  the  net  without  interrupts,  would  require 
about  5  minutes,  of  course,  the  ARPA  net  communication  systen 
would  be  more  effectively  used  to  interrogate  the  computed 
output.  If  the  computed  results  actually  need  to  be  shipped 
out  of  the  Illiac  site  for  post  processing  and  displaying, 
magnetic  tapes  would  better  serve  the  purpose. 

Grid  sizes  considerably  greater  than  104  nodes  are 
anticipated.  Also,  stress  time  histories  will  be  computed 
m  addition  to  the  displacement  and  velocity  fields.  Con¬ 
sequently,  the  more  data  reduction  that  can  be  accomplished 
at  the  Illiac  site,  the  better.  It  appears  to  us  that  the 
UN I  CON  mass  storage  device  has  considerable  potential  as  a 
Plot  device  due  to  the  high  resolution  that  is  achieved  in 
writing  on  the  film  strips.  Optical  techniques  could  be 
easily  employed  to  enlarge  and  print  the  film  plots  after 
they  left  the  Illiac  site.  Such  a  plot  capability  would 
prove  an  asset  to  many  of  the  Illiac  users. 
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V.  TEST  CALCULATIONS  USING  3-D  ILLIAC  CODE 


5.1  PROBLEM  SET  UP  AND  RESULTS 

For  the  purpose  of  debugging  and  testing  GLYPNIR 
coding  on  the  UCSD  simulator,  a  simple  rectangular  rod  is 
analyzed.  A  step  load  is  applied  axially  at  one  end  of 
the  rod  as  pictured  in  Fig.  5.1,  and  the  resulting  stress 
wave  is  allowed  to  propagate  to  the  end  of  the  rod.  This 
problem  is  modeled  as  a  four  element  grid  with  20  nodes  as 
shown  in  Fig.  5.2.  The  Fortran  version  of  SWIS  (run  on 
S3's  UNIVAC  1108)  performs  the  explicit  time  stepping  de¬ 
veloped  in  Section  II,  Eq.  (2.18).  With  the  damping  co¬ 
efficient  8  set  to  zero  we  obtain: 

{ U  (t+A  t) }  =  + At 2  [M] " 1 { F (t ) }  -  fU(t-At)} 

+  2(u(t)}  +  [A]  (U(t) }  (5.1) 

where 

[A]  =  -  At 2  [M]  "  1  [K]  (5.2) 

which  has  been  programmed  for  operation  on  the  ILLIAC  IV. 

In  addition,  the  Fortran  code  also  arranges  the  nonzero  terms 
o f  the  [A]  matrix  according  to  their  desired  location  in 
the  ILLIAC  processors  in  preparation  for  the  sparse-matr ix 
multiply  [A]{U(t)}  (Eq.  5.1),  The  arranged  terms  are  then 
transferred  from  the  UNIVAC  1108  to  the  UCSD  simulator  where 
parallel  time  stepping  computations  are  performed. 

The  first  simulations  ran  only  one  time  step  to  test 
the  sparse -matrix  multiply  algorithm.  Two  node  numbering 
schemes  were  used  to  confirm  the  generality  of  the  code. 

The  number  of  operative  processors  was  reduced  to  eight 
for  this  test  problem  in  order  to  generate  a  nontrivial 
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X 


as  a  Step  Function  in  Time 

y 


P-wave  Velocity  =  10. 0  km/sec 
S-wave  Velocity  =  5.0  km/sec 
Density  =  2.0  g/cm3 
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Fig.  5 . 1 - -Tes t  Problem  1:  Uniaxial  wave  propagation 
in  a  rectangular  rod. 


(a)  Node  Numbering  Scheme  A 


15  7 


—*  y 


(b)  Node  Numbering  Scheme  B 


ode  numbering  schemes  for  Test  Problem  1: 
ial  wave  propagation  in  a  rectangular  rod. 


simulation  with  more  than  one  nodal  displacement  associated 
with  each  processor. 

As  depicted  in  Table  5.1,  the  two  node  numbering 
schemes  result  in  a  markedly  different  arrangement  of  the 
nonzero  terms  of  the  [A]  matrix  in  the  eight  processors; 
however,  we  note  that  matrix  row  numbers  (n)  increase  mono- 
tonically  within  each  processor.  This  is  to  assure  the 
completion  of  lowest  uncompleted  row  of  the  sparse  matrix 
multiplication  at  the  earliest  sequential  step  possible 
within  the  constraints  of  the  node  numbering  scheme. 

In  node  numbering  scheme  A,  we  note  that  each  of 
the  eight  processors  contributes  to  row  one  of  the  sparse 
matrix  multiplication,  and  the  processor  rowsum  for  matrix 
row  one  is  performed  directly  following  the  first  set  of 
parallel  multiplications  and  accumulations.  Whereas  for 
scheme  B,  only  the  first  three  processors  contribute  to 
matrix  row  one.  This  causes  the  processor  rowsum  for 
matrix  row  one  to  be  delayed  a  few  steps.  However,  this 
results  in  no  loss  of  efficiency  since  the  remaining  five 
processors  are  working  ahead  on  matrix  row  numbers  3,  4,  5 
and  6.  A  loss  of  efficiency  does  occur  near  the  completion 
of  the  matrix  multiplication  since  some  of  the  processors 
have  finished  while  other  processors  are  still  computing. 
This  effect  would  be  negligible  for  large  problems. 

The  simulation  of  the  full  time  stepping  process 
was  performed  with  node  numbering  scheme  B  in  Table  5.1 
on  a  subset  of  the  problem  described  above.  The  simulation 
was  initialized  with  the  wave  part  way  down  the  rod  and  run 
for  four  time  steps. 

The  results  of  this  run  are  depicted  in  Fig.  5.3. 
Solid  lines  on  the  graph  indicate  the  four  time  steps  run 
on  the  simulator.  The  numerical  results  of  the  simulation 
were  in  exact  agreement  with  the  results  from  the  UNIVAC  run 
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TABLE  5.1 

ARRANGEMENT  OF  SPARSE  MATRIX  INFLUENCE  COEFFICIENTS 
FOR  TEST  PROBLEM  1  IN  AN  EIGHT-PROCESSOR  SIMULATION 

Node  Numbering  Scheme  A 


Displacement,  km 


for  the  same  times.  Thus,  the  curves  from  the  comparison 
run  exactly  superimpose  the  curves  of  the  simulation. 

5 . 2  TIMING  RESULTS 

Timing  information  for  the  GLYPNIR  version  of  SWIS 
was  provided  by  the  simulations  on  the  UCSD  B6700.  As  the 
sparse-matr ix  multiply  accounts  for  over  97  percent  of  the 
time  spent  per  time  step,  the  timing  of  the  matrix  multiply 
is  analyzed  in  some  detail.  All  times  are  ILLTAC  IV  execu¬ 
tion  times  as  given  by  the  simulator.  The  computations  for 
one  time  step  can  be  broken  into  three  logical  sections. 

1.  Time  for  loop  control  and  bookkeeping: 

^  0.05  ms 

2.  Time  for  adding  terms  of  Eq.  (5.1)  not  involv¬ 
ing  the  sparse-matrix  multiply: 

0.14  ms 

3.  Matrix  multiply  time: 

a.  Time  to  perform  the  partial  sum  consisting 
of  a  3  x  3  submatrix  of  [A]  times  three 
components  of  { U ( t ) } : 

44  ps 

b.  Time  to  rowsum  the  partial  sums  for  a  com¬ 
pleted  row  of  [A]  x  { U f t ) } : 

92  ps 

The  sparse-matrix  multiply  performed  in  the  GLYPNIR 
simulation  does  a  10  row  simulated  ILLIAC  disk  read  for  each 
time  through  step  3-a.  The  10  rows  contain  one  3x3  sub¬ 
matrix  of  [A]  for  each  PEM.  Since  the  minimum  read  avail¬ 
able  on  the  ILLIAC  disk  is  16  rows  (1  page),  a  simple  ex¬ 
tension  of  the  code  will  be  used  on  the  ILLIAC  in  which 
two  pages  are  read  at  one  time.  One  block  of  two  pages 
(32  rows)  just  contains  the  30  rows  necessary  for  three 
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3x3  submatrices  per  PEM.  This  will  allow  three  partial 
sums  (step  3-a)  for  each  disk  read.  The  times  given  for 
the  partial  sums  in  the  matrix  multiply  may  simply  be 
multiplied  by  three  to  give  the  computation  time  for  a 
two  page  read. 

These  figures  can  now  indicate  the  overall  machine 
time  for  a  typical  problem  run  on  the  GLYPNIR  code.  Consider 
a  grid  containing  104  nodes.  This  is  a  large  problem  for 
common  serial  machines,  but  is  small  enough  to  contain  all 
nodal  displacements  (U(t)}  and  (U(t+At)}  in  the  PEM's 
(6  nodal  disp.lacements/node)  . 

The  influence  coefficients  [A]  for  seven  nodes 
will  fit  into  a  two  page  block  read  from  disk: 

(64  PE's)  x  (3  3x3  matrices/block/PE)  64  x  3 

(27  neighbors/node)  "  =  '27  ”  = 

Tne  total  sparse-matrix  multiply  time  for  a  104  node  problem 
for  one  time  step  is  then: 

1.  Partial  sum  time: 

3  x  44  y  s/b lock  =  132  ys/block 

2.  Time  for  summing  partial  sums: 

7  nodes/block  x  92  ys/node  =  644  ys/block 

3 .  Total: 

(132  ys  +  644  ys)/block  x  104  nodes  ,  . 

7  nodes/block -  =  1A  sec/time  step 

(the  execution  time  for  other  equation  terms  and 

o  erhead  are  negligible) 

The  time  given  is  for  GLYPNIR  implementation  of  the 
code  without  consideration  of  the  time  taken  for  I/O.  Using 
a  double  buffering  scheme,  the  only  extra  time  consumed  would 
be  execution  of  ^ode  for  I/O  control  and  bookkeeping.  The 
additional  code  should  not  increase  the  estimated  time  beyond 
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two  seconds/time  step  for  the  101*  node  problem.  It  is  also 
possible  to  speed  up  the  code  by  implementing  the  inner  loop 
of  the  matrix  multiply  in  ASK. 


VI.  SUMMARY  AND  CONCLUSIONS 


A  long-standing  controversy  between  FE  and  FD  methods 
has  been  partially  resolved.  Both  test  calculations  and 
theoretical  comparisons  bear  evidence  that  the  two  numerical 
methods  are  very  similar  in  nature.  The  spatial  difference 
coefficients  for  the  simplest  FD  scheme  are  identical  to  those 
of  the  FE  method  using  tetrahedral  elements  in  a  uniform 
rectal inear  grid.  The  difference  coefficients  obtained  using 
trilinear  FE  bricks  are  considered  to  be  superior  to  those 
obtained  from  the  cell-centered-stress  FD  meti.od  because  of 
an  instability  that  exists  in  this  FD  scheme.  The  difference 
coefficients  for  the  FE  brick  reduce  to  those  for  the  cell- 
centered-stress  FD  method  when  a  one-point  integration  procedure 
(implying  uniform  strain  energy  density  within  an  element)  is 
employed  in  evaluating  the  element  stiffness  matrix  for  the 
FE  brick.  And,  conversely,  a  FD  method  is  described  that 
results  in  the  difterence  coefficients  for  the  FE  brick  using 
exact  integration  for  the  element  stiffness  matrix. 

The  major  difference  in  conventional  FE  and  FD  computer 
codes  is  in  the  processing  of  the  numerical  difference 
equations.  We  are  adopting  an  explicit  time  stepping  procedure, 
which  is  generally  associated  with  FD  codes  and  a  type  of 
artificial  damping,  which  is  almost  exclusively  associated 
with  FE  codes.  Also,  we  have  developed  and  tested  a  simple 
non-reflecting  boundary  condition  which  is  equally  applicable 
to  both  FE  and  FD  codes. 

Test  calculations  have  been  performed  on  S3's  UNIVAC 
1108  co  compare  existing  FE  and  FD  codes.  Comparisons  were 
made  for  two  problems:  a  spherically  symmetric  explosion 
with  an  exponentially  decaying  step  pressure  and  Lamb's 
problem- -an  impulse  loading  applied  to  the  free  surface  of 
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a  homogeneous  half  space.  Neither  method  displayed  note¬ 
worthy  superiority  on  the  two  problems  presented.  Further 
comparisons  may  uncover  advantages  of  one  method  over  the 
other  for  special  problem  conditions. 

Several  machine  computations  have  also  been  performed 
using  the  SWIS  code  to  test  certain  aspects  of  the  3-D  time 
stepping  algorithm  such  as  accuracy,  artificial  damping, 
computing  economy,  and  the  nonreflecting  boundary  condition. 

Very  satisfactory  results  were  obtained  for  the  case  of  a 
step  loading  applied  to  a  chain  of  100  brick  elements. 

A  parallel  processing  time  stepping  algorithm  for 
propagating  elastic  waves  has  been  developed  and  tested  on 
the  UCSJ  simulator.  A  stress  wave  generated  by  a  step  loading 
applied  to  a  four-element  chain  of  brick  elements  was 
propagated  part  way  down  the  element  chain  using  S3's  UNIVAC 
computer.  At  an  intermediate  point  in  the  serial  machine 
calculations,  the  stress  wave  calculations  were  transferred 
to  the  parallel  processing  algorithm  where  four  time  steps  were 
performed.  The  results  from  the  parallel  algorithm  were  compared 
with  the  serial  machine  results  to  verify  the  GLYPNIR  code. 

Other  major  accomplishments  that  were  completed  during 
the  period  April  1972  through  December  1972  and  reported  on 
in  this  report  include:  the  development  of  a  3-D  grid  generator, 
the  preliminary  development  of  3-D  plot  capability,  and  the 
development  of  sparse  matrix  multiply  and  sort  algorithms 
for  processing  on  the  ILLIAC  computer. 

The  adaptation  of  the  ILLIAC  IV  system  for  processing 
elastic  wave  calculations  is  just  beginning.  Based  on  timing 
estimates,  the  ILLIAC  code  is  expected  to  be  extremely  fast, 
requiring  about  0.4  seconds  per  time  step  for  a  104-node  grid. 

In  addition,  there  are  virtually  no  geometric  restrictions  of’ 
the  type  that  would  preclude  local  mesh  refinements  or  the 
treatment  of  odd-shaped  solids. 


Further  development  is  needed  to  extend  the  elastic 
stress  wave  code  for  treating  nonlinear  materials.  Also, 
imaginative  graphical  techniques  need  to  be  employed  for 
displaying  results  from  stress  wave  calculations. 
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appendix  a 

FINITE  ELEMENT  FORMULATION 

A. 1  VIRTUAL  WORK 

Conservation  of  momentum  for  an  arbitrarily  nonlinear 
material  is  expressed  at  an  instant  in  time  by  the  virtual 
work  expression 


f  (  d“i  36u-  \  r 

7  3T-  6ui  ♦  °ij  35^  -  Vui)  dv  -  /  h 


ds  =  0 


CA.l) 


in  which  6Ui  isji  virtual  displacement,  I  is  a  specified 
body  force,  and  Ti  is  a  specified  traction  applied  to  the 
surface  SQ.  This  is  the  energy  principle  used  to  develop 
nonlinear  procedures  by  the  FE  displacement  method.  If  we 
restrict  ourselves  to  the  small  displacement  theory  of  linear 
elasticity,  the  virtual  expression  reduces  to 


/ (pu  6u.  +  E.,..  Suj  36ui  ,  \  f 

•v'  J  5*1  '  ri  6Ui)dV  "  J  Ti  {ui 


ds  =  0 


where  E^^  contains  the  elastic  modulii.  For  isotropic 
materials,  we  have 


(A. 2) 


Eikjl  =  U  Sij6kl  *  •*  5il«kj  *  *  SifcSj! 

A. 2  FINITE  ELEMENT  SPATIAL  REDUCTION 

A  single  approximation  is  employed  in  conventional  finite 
element  procedures  regarding  the  behavior  of  the  dependent 
variable  within  the  region  of  each  element.  In  this  develop¬ 
ment  we  approximate  the  displacement  field  in  an  element  by  a 
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strains , 


£  fx,t) 
ij  ~ 


and  stresses, 


aij  (£»*) 


ljkl  Tx.  *  C  t ) , 


(A. 9) 


Virtual  fields  are  also  expressed  in  terms  of  the  polynomial 
interpolation  field.  For  example,  the  virtual  displacement 
field  appearing  in  Eq.  (A. 2)  is  simp.y  denoted 

6ui(x,t)  =  <V(x)^>  j  5  U  ^  ( t )  j  .  (A. 10) 

In  order  to  assure  continuity  in  the  displacement  field 
across  the  interface  of  adjacent  elements,  the  interpolation 
functions  must  satisfy  the  condition  that  displacements  along 
that  portion  of  an  element  boundary  common  to  an  adjacent 
element  only  depend  on  the  nodal  displacements  located  on 
that  boundary.  That  is,  a  nodal  displacement  must  have  no 
influence  on  the  surface  displacements  at  the  opposite  end  of 
the  element.  When  this  requirement  is  fulfilled  the  displace¬ 
ment  field  on  an  element  surface,  as  deduced  from  Eq.  (A. 4), 
becomes 


V; .»)  -  £  »„(!)  «.n(t)  (a. id 

n=l 

or  in  matrix  form 

ui(^»t)  =  |Ui(t)|  (A. 12) 

for  x  on  element  boundary  b.  The  following  section,  "Iso¬ 
parametric  Elements",  describes  how  this  requirement  is  ful¬ 
filled,  even  when  using  curved  elements. 
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The  polynomial  approximation  to  the  element  displace¬ 
ment  field  is  incorporated  into  the  governing  virtual  work 
equation,  Eq.  (A. 2),  to  yield 


E 


e=l 


T 


/  <Or  pCO"  I5!!- 

ve 


E  Is11? 


e  =  l 


e  T 

<  34>  >  P 

3x j  n i j  kl 


a  c  > 

3x, 


dv  U 


b  =  l 


dS 


(A. 


in  which  volume  integrals  are  carried  out  element  by  element, 

E  denoting  the  total  number  of  elements,  and  similarly  the 
surface  integral  is  carried  out  surface  by  surface,  B  de¬ 
noting  the  total  number  of  element  surfaces  on  S^. 

The  elements  are  now  assembled  by  joining  the  noder 
common  to  various  elements.  Symbolically  this  can  be  accom¬ 
plished  by  expressing  element  and  boundary  nodes,  in  terms  of 
the  global  system,  i.e., 

|UM  *  't61  |Ui|  <A- 


where  IT  is  a  column  listing  of  all  of  the  nodal  displace- 

n 

ments  that  appear  in  the  entire  continuum.  Each  row  of  [T  ] 
has  just  one  nonzero  term  which  is  unity.  The  term  is  located 
to  pick  out  the  node  from  the  global  system  that  corresponds 
to  the  element  node  associated  with  that  row.  Similarly,  the 
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transformation  from  the  global  numbering  scheme  to  the  surface 
numbering  scheme  is  expressed 


1  (1^  )  =  1  (  IJ  I 

(  i I  11  (if 


Equation  (A. 13)  is  then  expressed  globally  using  Eqs.  (A. 14) 
and  (A. 15)  to  obtain 


where 


6»ijT  ([MlJSjj  *  [KjjljUj 


F. 

l 


E 

[ 

e  =  l 


[M]  =  ^  [Te  ] T  [Me  ]  [Te  ] 


[MC]  -  /  <♦*>*„<♦•> 


dV 


V' 


[«ij]  -  ^  [Te]T|K®j)[TeJ 


dV 


e  =  l 


[K 


/T 

_0<T>  v  <34>e> 

e  ik j 1  3xx 


dV 


(F 


i>  -  E  [TejT  /<oT  h 

e  =  l  Ve 

*  i  dV  /  <*»>TTi 

b=l  V 


dV 


dS 


(A. 15) 


(A. 16) 


(A. 17) 


(A. 18) 


(A. 19) 


Since  the  virtual  displacements  are  arbitrary,  the  discre¬ 
tized  version  of  the  virtual  work  expression,  Eq.  (A. 16) 
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reduces  to 


[M]  J  u  i  C  t )  |  +  [Ki.]J.U.(t)|  =  |F.(t)J 

from  which  we  obtain  liq .  (2.10)  in  Section  2.4. 


(A. 20) 


A  considerable  reduction  in  the  computing  effort  can 
be  achieved  by  diagonalizing  the  mass  matrix  [M] ,  i.e.,  by 
lumping  the  element  masses  at  the  node  points.  One  way  ol 
diagonalizing  { M ]  is  to  set  each  diagonal  term  equal  to 

the  sum  terms  in  the  corresponding  row.  Many  studies  have 
been  conducted  to  examine  the  effects  that  lumping  the  masses 
at  the  nodes  has  on  the  resulting  calculations.  The  general 
concensus  is  that  little  if  any  advantage  is  gained  by  car ly¬ 
ing  the  nondiagonal  mass  matrix,  and  considerable  computa¬ 
tional  advantage  is  gained  by  diagonalizing  [M] . 


A .  3  ISOPARAMETRIC  ELEMENTS 

Numerical  refinement  and  computational  efficiency  is 
ol  the  utmost  importance  in  treating  stress  waves  in  three- 
dimensional  solids.  For  this  reason,  highly  sophisticated 
three-dimensional  elements  are  being  considered.  Figure  A.l 
illustrates  the  basic  tetrahedral  element  that  is  presently 
coded  into  STATIC,  one  of  S3,s  static  finite  element  analyzers. 
I ho  element  is  of  comparable  accuracy  to  a  low  order  differenc¬ 
ing  scheme,  with  properties  similar  to  the  basic  trian^uia. 
clement  used  for  two-dimensional  analysis.  The  tetrahedral 
shape  is  ideal  for  simulating  irregular  geometries  as  has 
been  demonstrated  by  Frazier  (1969). 

Both  two-  and  three-dimensional  isoparametric  elements 
are  presently  being  coded  into  STATIC.  The  remaining  portion 
ot  this  section  briefly  describes  how  these  higher  order 
elements  are  being  developed  in  three-dimensional  space. 
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V 


Inner-element  Displacements : 


Uj(x)  =  cfP>  ♦  cP>x  ♦  c[2)x  ♦  cP)x 

1  1  *  2  1  3 

"  <*(x)>|u.J 


A‘*:-B*sic  tet rahedron  element  that  permits 
linear  displacements  within  the  region  of  the  element 

boundar iesC°nt inU°US  dlsPlacements  across  the  element 


As  indicated  by  Figs.  A. 2  and  A. 3,  a  skewed  hexahedron 
representing  a  typical  finite  element  is  mapped  into  a  two- 
unit  cube.  This  mapping,  which  is  not  necessarily  conformal, 
is  defined  by  the  expression 


xi  ■  !xij 


(A. 21) 


in  which  j |  is  a  column  listing  of  the  coordinates  of  the 
node  points  associated  with  element  e  expressed  in  the  global 
Cartesian  system  t^,  and  ^4>e(n)^>  is  a  row  listing  of  spatial 
interpolating  functions  expressed  in  the  local  coordinate  sys¬ 
tem  n^.  Specifically,  the  spatial  interpolation  functions 
for  the  linear  isoparametric  element  of  Fig.  A. 2  are  given  by 

4>e(n)  =  (l+n  )  (l  +  n  )  ( l  +  n  )/8 

1  ~  1  2  3 

<J>e(n)  «  (1-n  )  (l  +  n  )  (l  +  n  )/8 

2  -  1  2  3 

4>G(n)  =  (l  +  n  )  (1-n  )  (l  +  n  ) / 8 

3  ~  1  2  3 

•ith  similar  expressions  for  4>e(n)  ...  4>e(n)  ,  (A. 22) 

and  the  spatial  interpolation  functions  for  the  quadratic  iso¬ 
parametric  element  of  Fig.  A. 3  are  given  by 

t®(n)  =  (l+n  )  (l+n  )  (l+n  )  (n  +n  +n  -  2)/8 

1  "  1  2  3  12  3 


<t>®(n)  =  (1-n  )  (l+n  )  (l  +  n  '  (-n  +n  +n  -  2 ) / 8 
2  1  2  2  12  3 

with  similar  expressions  for  <PG(n)  ...  4>e(n) 

3  8  ~ 
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(a)  Global  coordinates  of  the 
assembled  solid. 


f'  1  Local  natural  coordinates 
of  the  element. 


Inner-element  Displacements  (linear): 

-  EEE n“ .a 

a=0  p  -  0  y  '  *1 

=  <4’(n)>}  il  [ 


Fig.  A.2--Basic  isoparametric  hexahedron  element  that 
permits  linear  displacements  within  the  region  of  the 
element  and  gives  continuous  displacements  across 
element  boundaries. 


(a)  Global  coordinate's  of  the 
assembled  solid. 


(b)  Local  natural  coordinates 
of  the  element. 


Inner-element  Displacements  (quadratic): 


*<♦  Cm  )>  juj 


Fig.  A. 3- -Curved  isoparametric  element  that  permits 
quadratic  displacements  within  the  region  of  the 
element  and  gives  continuous  displacements  across 
element  boundaries. 
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4>e(n)  =  C l - n  )2(l+n  )  (l  +  n  )/4 

9  -  1  2  3 

(n)  =  (l-n  ) 2 (l-n  )  (i+n  )/4 

1  0  ~  1  2  3 

with  similar  expressions  for  $e  (n)  ...  <j>e  (n)  (A.  23) 

1  1  -  2  0- 

I  bus  Lq.  (A. 21)  yields,  point  by  point,  the  global 
coordinate  value  corresponding  to  each  local  coordinate 

value  rjj  .  When  n j  is  equal  to  an  element  node  point 
(^.1  »^1)  »  will  take  on  the  value  of  the  corresponding  node 

point  in  the  global  system.  The  same  requirement  is  placed 
on  the  displacement  field:  When  the  displacement  field  is 
evaluated  at  an  element  node  point,  the  corresponding  nodal 
displacement  is  obtained.  Consequently,  the  same  spatial 
interpolation  functions  are  used  for  the  displacement  fielo. 

\(n)  =  |U?|  (A. 24) 


Spatial  derivatives  with  respect  to  the  global  Carte¬ 
sian  coordinates  are  expressed  in  terms  local  coordinate 
derivatives  using  the  chain  rule  for  differentiation, 


< 


d_\  /Vi 

9xk  \9rm 


(A. 25) 


We  note  that  3 nm/ 3x ^  is  not  known  explicitly,  but  its  in¬ 
verse,  9XjV3nm,  is  computed  directly  from  Eq.  (A. 21), 


I  ve  ! 
I  K  \ 


(A. 26) 


The  volume  integration  that  is  involved  in  defining 
element  properties,  Eqs.  (A. 17),  (A. 18),  and  (A. 19),  is  carried 
out  in  the  local  coordinate  system  over  the  two-unit  cube  using 
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the  property 


dx  dx  dx  =  J(n)dn  dn  dn 

12  3  ~  1  2  3 


(A. 27) 


where  the  transformation  Jacobian  J  is  simply  the  determinate 
of  9x^/3n  of  Eq.  (A. 26),  i.e., 


J(n)  =  Det  /^f.  (n)\  1 X? 


(A. 28) 


Using  the  appropriate  spatial  interpolation  functions  given  by 
Eqs.  (A. 22)  or  (A. 23),  the  element  mass  matrix,  as  defined  in 
Eq.  (A. 17),  is  given  by 


+1  +1  +1 


1  =  f  f  f  <^e  (n))>  P  <^4>c  (n)^> 

- 1  J-1  J-l 


J  dn  dn  dn  CA.29) 

1  2  3 


and  the  element  stiffness  matrix,  as  defined  by  Eq.  (A. 18), 
is  given  by 


+  1  +1 


[Ke.] 

ij 


f,  l  l  ( 


3xk  V  7A<Ee  ,  At  ■'d^  ,  \ 

V\  (~7  1;ikjlVnn  ‘-V 


<£)' 


J  dn  dn  dn 

1  2  3 


(A. 30) 


The  actual  integration  is  carried  out  numerically  using  the 
Gaussian  quadrature  integration  scheme. 
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APPENDIX  B 


ARTIFICIAL  DAMPING 


B . 1  CAUSES  OF  NUMERICAL  OSCILLATIONS 


In  the  absence  of  numeiical  clamping,  computer  calcula¬ 
tions  for  the  propagation  of  discontinuous  wave  forms  contain 
oscillations  that  die  out  away  from  the  propagating  discon¬ 
tinuity,  as  illustrated  in  Figs.  3.6  and  3.7.  These  nuisance 
oscillations  are  the  result  of  two  phenomena.  One  is  not  re¬ 
lated  to  the  propagating  nature  cf  the  wave  but  is  simply 
due  to  the  truncation  error  of  approximating  a  discontinuous 
function  by  piecewise  polynomials  of  low  order.  The  result 
is  quite  analogous  to  the  Gibb's  phenomena  of  Fourier  Analy¬ 
sis  —  oscillations  in  the  vicinity  of  discontinuities.  Just 
as  with  the  oscillations  from  a  truncated  Fourie”  series,  a 
nonoscillating  signal  can  be  obtained  by  suppressing  the 
participation  of  the  higher  wave  numbers  (short  wave  length) 
in  a  transitional  manner  so  that  the  highest  wave  number  is 
suppressed  the  most. 

The  other  source  of  the  numerical  oscillations  results 
from  numerical  dispersion  —  the  higher  wave  numbers  tend  to 
propagate  at  a  velocity  slightly  different  from  the  low  wave 
numbers,  which  propagate  at  the  velocity  of  the  medium.  When 
the  consistent  mass  matrix  of  the  FE  method  is  employed,  the 
higher  wave  numbers  propagate  at  a  higher  velocity  than  that 
of  the  medium.  This  fact  is  the  direct  result  of  the  eigen¬ 
values  (natural  frequencies)  of  the  discrete  system  being 
bounded  from  below  according  to  the  Rayleigh  quotient, 

Washizu  (1968).  The  result  of  the  high  wave  numbers 

propatating  at  a  slightly  higher  velocity  is  seen  in  work  by 
Goudreau  (1970). 

The  dispersion  from  the  lumped-mass  approximation  of 
FE,  which  is  equivalent  to  the  FD  method,  is  not  entirely 
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one-sided.  The  eigenvalues  obtained  by  using  the  lumped 
mass  approx  mation  can  lie  both  above  and  below  the  correspond¬ 
ing  eigenvalues  for  the  bounded  continuum.  The  preponderance 
of  dispersion,  however,  is  of  the  type  where  the  higher  wave 
numbers  propagate  at  a  slightly  lower  velocity  than  the  velo¬ 
city  for  the  medium. 

The  elimination  of  the  dispersion- induced  oscillations  is 
similar  to  that  for  the  nondynamic  oscillations  —  suppress 
the  higher  wave  numbers  in  a  transitional  manner.  The  suppres¬ 
sion  of  the  higher  modes  needs  to  be  more  pronounced  as  time 
increases  in  the  calculations,  because  numerical  dispersion 
causes  the  higher  modes  to  become  out  of  phase  with  the  wave  front 
b'/  an  ever-increasing  amount.  Therefore,  a  steep  wave  front 
will  necessarily  deteriorate  somewhat  at  late  times.  In 
Fig.  3.8,  we  observe  that  a  velocity  step  (particle  velocitv) 
spreads  over  about  10  zones  after  propagating  through  160 
zones  . 

It  is  appropriate  to  note  that  earth  materials  display 
damping  properties.  Low  amplitude  seismic  waves  are  attenua¬ 
ted  in  a  frequency  dependent  manner  —  high  frequency  waves 
(short  wave  lengths)  are  attenuated  more  rapidly  than  low 
frequency  waves  (long  wave  lengths).  Also,  high  amplitude 
waves  passing  through  a  small  block  of  material  trace  out 
a  hysteresis  loop  when  stress  is  plotted  against  strain.  The 
area  of  the  loop  gives  the  energy  density  that  is  lost  in  the 
nonlinear  dynamic  process.  The  type  of  artificial  damping 
that  is  developed  below  to  remove  numerical  oscillations  also 
results  in  a  hysteresis  loop,  which  is  an  ellipse.  The 
numerical  viscous  damping  parameter  can  be  adjusted  to 
cause  the  area  of  the  elliptical  hysteresis  loop  to  be  equal 
to  the  area  that  results  from  the  nonlinear  earth  material 
so  that  the  energy  loss  of  the  linear  system  becomes  equal 
to  the  energy  loss  from  the  nonlinear  material.  Care  must 
be  exercised  in  this  procedure  to  assure  the  two  systems  are 
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made  equivalent  at  the  predominant  frequency  of  the  propagat¬ 
ing  waves,  because  the  energy  that  is  dissipated  per  cycle  of 
ground  motion  may  have  a  different  frequency  dependence  for 
the  two  systems. 

B- 2  MATHEMATICAL  ANALYSIS  OF  ARTIFICIAL  DAMPING 

In  this  development  we  will  treat  three  types  of  arti¬ 
ficial  damping;  each  has  a  different  frequency  dependence. 

The  damping  matrix  of  Section  2.4  is  set  to 


£C]  =  a  [M]  +  25  [KM]  +  g[K] 


where  [M]  = 

6idMi  • 

The  matrix 

square  root  [KM]'2 

is  de 

fined  in  terms  of  the 

eigenvectors 

(mode  shapes)  {$} 

—  m 

%»  m  =  1,2,  ...  3N 

and 

eigenvalues 

(natural 

frequencies) 

for 

the  discrete 

system 

3N 

[KM] 

*  -  [Ml  £ 

m=l 


where  the  3N  eigenvalues  and  corresponding  eigenvectors  are 
computed  from  the  equation 

[K]U}m  =  w*[M]U]m  ,  m  =  1,2,  ...  3N  (B.3) 


and  Mm  is  the  normalization  constant  for  scaling  the  m— 
eigenvector.  Orthogonality  between  the  various  eigenvectors 
is  expressed 


M  <5 
m  nm 


The 
i .  e 


eigenvalues  are 


,tu  <  0)  ,  . 

’  m  -  m+1 * 


conveniently  arranged 


in  ascending  order, 
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The  expression  for  the  artificial  damping  matrix  above, 

Eq.  (B.l),  is  substituted  into  the  linearized  discrete  equa¬ 
tion  of  motion,  Eq.  (2.10),  to  obtain 

[  M  ]  { U  ( t ) }  +  [aM  +  2  c/p  +  BK]{U(t)}  +  [K]{U(t)>  =  {F(t)J. 

(B.4) 


For  the  purpose  of  conciseness,  we  will  not  develop 
modal  decomposition  for  systems  in  which  nodal  displacements 
are  specified;  we  will  restrict  our  development  to  systems 
in  which  surface  tractions  are  specified  on  the  boundaries. 

For  the  case  involving  no  specified  nodal  displacements,  the 
eigenvectors  form  a  complete  set  of  basic  functions  for  ex¬ 
pressing  the  nodal  displacements  at  an  instant  in  time.  That 
is,  the  nodal  displacements  can  be  expressed  as  a  time  vary¬ 
ing  linear  combination  of  the  eigenvectors  for  the  discrete 
system 

3N 

(U(t)}  =  'jT  Tk(t)  H)k  (B.S) 

k=l 


where  T, (t)  is  the  time 
Kth 

for  the  k —  eigenvector. 


varying  participation  coefficient 


The  above  eigenvector  expansion  for  the  nodal  displace¬ 
ments  is  substituted  into  Eq.  (B.4).  The  resulting  simul¬ 
taneous  equations,  expressed  in  matrix  form,  are  then  de¬ 
coupled  by  premultiplying  through  by  M~  {4>}^  to  obtain 


The  more  general  case  involving  specified  nodal  displace- 

,hlSJ°r»eS  haS  been  develoPed  elsewhere  by  Frazier 
(1969),  also  by  Przemieniecki  (1968).  The  more  general 
development  requires  additional  notation  which  does  not  add 
to  our  understanding  the  influence  of  the  various  types  of 


L 
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where 


Vc>  *  2c»V*>  *  ■£*„<*>  -  «!„(») 


(B.  6) 


c  =  a/2  +  £oo  +  8/ 2ai? 
n  n  n 

and 

QnCt)  -  (lift))  . 

In  obtaining  the  decoupled  equation  above,  we  have  used  the 
substitution 

3N 

ir'ujJiKM]'’  £  tk(t)u)k  -  %tn(t) 

0  =  1 

from  Eq.  (B.2).  Also,  we  have  substituted 

3N 

k=l 


using  Eq.  (B.3)  . 

The  general  solution  for  the  n—  participation  co¬ 
efficient  is  now  expressed  using  a  convolution  integral  in 
time 

Tn(t)  -  o  Cnt  cost^t  Tn(0) 


*  ^  '  C"'  (fn(0)  *  CnTn‘°>)  si”“dn‘ 

1  f  'cn(t'T) 

+  J  e  sina)dn(t-T)Qn(T)dT  (B.7) 

dn  0 
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■HlhliiMII  If  tti 


when 


2  2 

GO  “  c 

n  n 


Tn(°)  =  Mn1{l}i[M]{U(0)} 


tn(^3  =  Mn^i)n[M]{U(0)} 

(U(0)}  and  (U CO) }  being  the  initial  conditions  for  the 
nodal  displacements  and  velocities,  respectively. 

From  Eq.  (B.7),  we  can  see  how  the  various  types  of 
damping  influence  the  dynamic  response  to  a  linear  system. 
First,  we  see  that  the  natural  frequencies  are  reduced  by 
the  factor  yi  -  c2/uj£  <  1  which  may  result  in  additional 
dispersion,  particularly  in  the  higher  nodes  where  c  is 
the  greatest.  More  importantly,  however,  we  find  that  the 
free  vibrations  that  result  from  excitation  at  t  =  0  are 
damped  by  the  factor 


-c  t 
n 


-Cwt 
_  n 
e  e 


fu>2t 

2  n 


(B .  8 ) 


Thus,  we  have  shown  that  the  term  a  damps  all  eigenvectors 
equally,  5  damps  eigenvectors  an  amount  proportional  to 
their  respective  natural  frequencies,  and  6  damps  as  the 
square  of  the  natural  frequencies.  Incidentally,  the  6 
damping,  which  has  been  incorporated  into  the  ILLIAC  time 
stepping  algorithm,  results  in  damping  only  in  elements 
undergoing  a  strain  rate;  those  elements  experiencing  rigid 
body  deformations  experience  no  damping. 
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B . 3  ESTIMATING  AN  OPTIMUM  DAMPING  COEFFICIENT 


The  B-type  damping  is  most  appropriate  for  the  numeri¬ 
cal  calculations  and  is  used  in  the  time  stepping  algorithm 
of  Section  2.4.  We  will  now  develop  a  technique  for  pre¬ 
dicting  values  of  B  for  use  in  numerical  calculations. 

Equation  (B.8)  permits  us  to  set  B  to  damp  a  specified 
eigenvector  any  desired  amount.  First,  let  us  relate  eigen¬ 
vectors,  eigenvalues,  and  wave  lengths.  For  1-D  geometry  (or 
plane  waves),  an  eigenvector,  characterized  by  a  wave  length 
X  ,  has  an  associated  e igenf requency  given  by 


0) 


n 


(B.9) 


where  c  ”  V  for  propagating  P  waves  and  c  =  Vg  for 
propagating  S  waves.  We  note  that  this  expression  is  used 
only  to  estimate  the  natural  frequencies  of  the  discrete  sys¬ 
tem.  Our  estimate  for  w  is  substituted  into  (B.8)  with 
a  =  c  =  0  to  obtain  a  damping  factor 


-c  t  -2tt2c2B  t / A 2 
n  _  „  n 


(B.10) 


Thus,  to  select  an  optimum  B  for  a  particular  calculation 
we  must  establish  three  quantities: 

1.  The  wave  length  of  the  spurious  oscillations 

that  are  to  be  damped.  From  Fig.  3.6,  we  find 
spurious  oscillations  with  wave  lengths  up  to 
about  six  grid  dimensions,  i.e.,  =  6AX. 

2.  The  time  at  which  the  oscillations  should  dis¬ 
appear.  This  presupposes  that  we  are  willing 
to  tolerate  some  oscillations  in  the  very  early 
stages  of  the  calculations.  Let  us  consider 

the  case  where  At  =  and  the  oscillations 

P 


are  to  be  removed  by  the  time  a  P  wave 
crosses  50  grids,  i.e., 


t  =  100  At  =  50 

P 


3.  A  damping  factor 


-c  t  -  ^  U) 2 1  . 

n  n  “2  f^i7r 

e  =  e  =  e  =0.135 


is  sufficient  to  remove  the  oscillations. 

The  optimum  value  of  0  for  the  job  is  then  calculated 
from  Hq.  (B.10) 

-2ttc20  t/X2  „ 

e  "  -  e'2  (B. 11) 


so  that 


3p/At  =  0.15  (B. 12) 

for  P  waves.  Similarly,  we  compute  an  optimum  0  for 

eliminating  spurious  oscillations  after  S  waves  have 

traversed  50  grid  dimensions  to  obtain  0  =  30  for 

S  P 

V  /V  =  /o  .  It  is  interesting  to  note  that  the  test  cal- 
p  s  b 

culations  presented  in  Section  3.3.  Fig.  3.7,  indicate 
0.1  <  0/At  <  0.2  for  the  P-wave  calculations.  For  propagat¬ 
ing  S  waves  we  got  0/At  =  0.5. 
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